Remove development diagnostics and clean up codebase

Strip out watchpoint system (thunkSetWatch, WATCH-PRE/POST), INT 10h
entry counters, segment verification dumps, and inline extern
declarations. Add neSetDebug to neload.h, correct -fno-gcse comment,
and add project CLAUDE.md for git-tracked notes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Scott Duensing 2026-02-21 18:27:00 -06:00
parent 628ef231b9
commit 431f573422
6 changed files with 117 additions and 178 deletions

107
CLAUDE.md Normal file
View file

@ -0,0 +1,107 @@
# Win31drv Project Memory
## Build Environment
- DJGPP cross-compiler: `~/djgpp/djgpp/bin/i586-pc-msdosdjgpp-gcc` (GCC 12.2.0)
- DJGPP binutils need `libfl.so.2`: stored in `tools/lib/` (Makefiles set LD_LIBRARY_PATH)
- CWSDPMI zip stored in `tools/cwsdpmi.zip` (extracted to bin/ during build)
- DOSBox-X: `flatpak run com.dosbox_x.DOSBox-X` (installed as user flatpak)
- CWSDPMI.EXE in `bin/` directory for DPMI support under DOSBox-X
- Config: `dosbox-x.conf` with S3 Trio64 machine type, 64MB RAM
## Project Structure
```
windriver/
├── Makefile # Top-level: builds demo, calls win31drv/Makefile
├── demo.c # Demo program
├── dosbox-x.conf # DOSBox-X config (S3 SVGA)
├── obj/ # Demo object files
├── bin/ # Executables + CWSDPMI.EXE
└── win31drv/ # Library
├── Makefile # Builds libwindrv.a
├── obj/ # Library objects
├── neload.c/h # NE format loader
├── neformat.h # NE structures
├── thunk.c/h # 32→16 bit thunking
├── windrv.c/h # Main API
├── winstub.c/h # Windows API stubs
├── winddi.h # DDI structures
└── wintypes.h # Win16 types
```
## DJGPP Portability Notes
- `uint32_t` is `unsigned long` (not `unsigned int`) in DJGPP — use `PRIu32`/`PRIX32` from `<inttypes.h>`
- Always include `<stdarg.h>` explicitly for `va_list`/`va_start`/`va_end`
- Headers must be self-contained (include their own dependencies)
## Thunking Architecture Notes
- **SS == DS == DGROUP**: Win3.x drivers assume SS == DS == DGROUP. VBESVGA's BBLT.ASM does
`PrestoChangeoSelector(SS, WorkSelector)` to create a code alias for compiled blit code.
thunkCall16 uses dgroupSel as SS (SP=0xFFF0) when available. Without this, the code alias
has the wrong base and the CPU executes garbage.
- **Register corruption with -O2 inlining**: When demo.c's demoDrawing is inlined into main,
DJGPP GCC 12.2.0 mishandles callee-saved registers across thunk calls in long functions.
Fix: `__attribute__((noinline))` on demoDrawing. Symptom: handle pointer corrupted to
a ColorInfo return value (e.g. 0xFF0001F6) between Demo 2 and Demo 3.
## DOSBox-X Driver Notes
- `waitForEngine()`: GP_STAT port 0x9AE8 bit 9 polling — S3 only (gIsS3 guard)
- **S3 detection**: Probe CR30 chip ID register. S3 chips: 0x81-0xE1. ET4000: 0x00.
Only apply S3-specific setup (cursor disable, dispYOffset, setDisplayStart) when isS3=true
AND driver is not VGA-class (1bpp/4planes).
- **Pattern scratch artifact**: S3 driver writes 8x8 dithered brush pattern to VRAM at fixed
position (~(144,1)-(151,8)) during accelerated pattern fills. Fixed by shifting CRTC display
start down 10 scanlines (`dispYOffset`) so the scratch area is off-screen.
- **`-fno-gcse` required for windrv.c**: With -O2 GCSE, stack layout causes issues during
16-bit driver calls. Only windrv.c needs this. See `WINDRV_CFLAGS` in win31drv/Makefile.
- Output DDI (polylines/rectangles) requires a **physical pen** from RealizeObject, not a
raw LogPen16T. The pen must be in DGROUP (same as brush, drawMode, PDEVICE).
- `wdrvUnloadDriver` does NOT auto-call Disable — caller must handle text mode restore
- `sleep()` hangs under DOSBox-X because BIOS timer ticks don't advance without I/O
- Debug output: `-d` flag enables verbose logging in neload, winstub, thunk, and windrv
- Known issue: mode mismatch HW=800x600 vs GDIINFO=640x480
## DGROUP Stack Management
- VGA.DRV ships with DGROUP[0x0A]=0xFFFF (stack bottom = top of segment → no stack).
Its BitBlt prolog calls a stack check function at 0x18CA that compares available stack
against [SS:0x0A]. With 0xFFFF, ALL functions fail immediately (return 0).
- Fix: patch [0x0A] to objBase after extending DGROUP. Only done when original = 0xFFFF.
- S3TRIO.DRV and VBESVGA.DRV have [0x0A]=0x0000 — no patching needed.
- **Do NOT unconditionally overwrite DGROUP offsets 0x00-0x0F** — VBESVGA.DRV stores
driver-specific data there (0x030A at offset 0, 0x01 at offset 4).
## BitBlt Source Device Rules
- For pattern-only ROPs (PATCOPY=0xF0, BLACKNESS=0x00, WHITENESS=0xFF, etc.),
lpSrcDev must be NULL (0:0) per DDI spec. VGA.DRV rejects non-NULL source for
pattern-only ROPs. S3TRIO.DRV tolerates it but correct behavior is NULL.
- Source dependency check: `ropNeedsSrc = (((rop8 >> 2) ^ rop8) & 0x33) != 0`
## INT 10h ES Translation
- Different INT 10h function families use different ES:offset registers:
VBE 4Fxx → ES:DI, AH=10h (palette) → ES:DX, AH=11h (fonts) → ES:BP, AH=1Bh → ES:DI
- Only specific AL subfunctions use ES as a buffer pointer; most don't
- Copy sizes must be exact (17 bytes for palette, CX*3 for DAC blocks, etc.)
- Copy direction matters: "Set" = copy-in only, "Read/Get" = copy-out only
## WINFLAGS Handling
- **WF_80x87 NOT used**: We don't save/restore FPU state across thunk boundaries
- **VGA-class drivers need WF_STANDARD**: VGA.DRV's physical_enable hangs in Enhanced
mode (polls VDD that doesn't exist). Auto-detected after Enable(style=1) returns
1bpp/4planes GDIINFO → repatch __WINFLAGS in all segments (0x0025→0x0015).
- SVGA drivers (S3TRIO, VBESVGA) use WF_ENHANCED normally
## ET4000 Driver Notes
- ET4000.DRV from Win 3.x distribution is SZDD-compressed; decompress with `msexpand`
(rename to .DR_ first, output is .DR without the V — rename to .DRV)
- DOSBox-X machine type: `svga_et4000` for ET4000 hardware emulation
- ET4000 is 640x480 8bpp, software-rendered (no accelerator engine in DOSBox-X)
- CR30=0x00 on ET4000 → isS3=false → no S3 engine wait, no display start shift
## Current Demo Status
- S3TRIO.DRV, VBESVGA.DRV, VGA.DRV, ET4000.DRV all work: Load → Enable → Draw → Disable → Unload
- Demo 1: Fill rectangles (BitBlt) — works
- Demo 2: Pixel patterns (Pixel) — works
- Demo 3: Lines/starburst (Output/Polyline) — works
- Demo 4: Screen-to-screen blit (BitBlt SRCCOPY) — works
- VGA.DRV: 640x480 4-plane 16-color mode; limited color palette but functional
- ET4000.DRV: 640x480 8bpp on svga_et4000; software-only, no hw acceleration
- Drivers stored in `drivers/` directory, copied to `bin/` during build

View file

@ -8,13 +8,12 @@ CC = $(DJGPP_PREFIX)/bin/i586-pc-msdosdjgpp-gcc
AR = $(DJGPP_PREFIX)/bin/i586-pc-msdosdjgpp-ar
CFLAGS = -Wall -Wextra -O2 -std=gnu99 -I.
# DOSBox-X's S3 Trio64 emulation corrupts specific memory addresses
# during 16-bit driver calls via thunkCall16. With -O2 GCSE enabled,
# the code layout places stack locals and register spills at addresses
# that overlap the corruption targets, causing wrong values in drawing
# parameters. Disabling GCSE for windrv.c changes the layout enough
# to avoid the overlap. Only windrv.c is affected (it has the drawing
# functions that call thunkCall16 with interleaved parameter setup).
# With -O2 GCSE enabled, GCC's code layout for windrv.c places stack
# locals and register spills at addresses that get corrupted during
# 16-bit driver calls via thunkCall16, causing wrong values in drawing
# parameters. Disabling GCSE changes the layout enough to avoid the
# issue. Only windrv.c is affected (it has the drawing functions that
# call thunkCall16 with interleaved parameter setup).
WINDRV_CFLAGS = $(CFLAGS) -fno-gcse
# DJGPP binutils need libfl.so.2 which may not be installed system-wide

View file

@ -112,4 +112,7 @@ bool neExtendSegment(NeModuleT *mod, int segIdx, uint32_t extraSize, uint32_t *o
// Debug: dump module information to stderr.
void neDumpModule(const NeModuleT *mod);
// Enable or disable verbose debug output.
void neSetDebug(bool enable);
#endif // NELOAD_H

View file

@ -54,7 +54,6 @@
#include "thunk.h"
#include "log.h"
#include <sys/farptr.h>
// Forward declarations
static bool installRelayCode(ThunkContextT *ctx);
@ -419,12 +418,6 @@ static __dpmi_paddr gOldCbVec;
static volatile bool gHandlerInstalled = false;
static bool gThunkDebug = false;
// Diagnostic: monitor 3 bytes at gDiagWatchSel:gDiagWatchOff for corruption.
// Set gDiagWatchSel nonzero to enable. Logs when bytes change.
static uint16_t gDiagWatchSel = 0;
static uint32_t gDiagWatchOff = 0;
static uint8_t gDiagWatchBytes[3] = {0};
// Shared area for passing parameters from the interrupt handler
static uint16_t gCbParams[THUNK_MAX_PARAMS];
static uint32_t gCbRetVal;
@ -459,19 +452,8 @@ uint8_t gCbStack[16384] __attribute__((aligned(16)));
uint32_t gCbStackTop;
// Worker function called from the assembly stub.
// Diagnostic: set by cbIntWorker to prove it was called
volatile uint32_t gCbWorkerCalled = 0;
volatile uint32_t gCbWorkerLastSS = 0;
volatile uint32_t gCbWorkerLastESP = 0;
volatile uint32_t gCbWorkerLastSlot = 0xDEAD;
void cbIntWorker(CbFrameT *frame)
{
gCbWorkerCalled++;
gCbWorkerLastSS = gCbSavedSS;
gCbWorkerLastESP = gCbSavedESP;
gCbWorkerLastSlot = (uint16_t)frame->ebx;
uint16_t slot = (uint16_t)frame->ebx;
if (slot >= gCallbackCount || !gCallbacks[slot]) {
@ -531,39 +513,12 @@ void cbIntWorker(CbFrameT *frame)
fflush(stderr);
}
// Diagnostic: check BEFORE callback dispatch
if (gDiagWatchSel != 0) {
uint8_t b0 = _farpeekb(gDiagWatchSel, gDiagWatchOff);
if (b0 != gDiagWatchBytes[0]) {
logErr("WATCH-PRE: %04X:%08" PRIX32 " changed %02X->%02X before CB[%u]\n",
gDiagWatchSel, gDiagWatchOff,
gDiagWatchBytes[0], b0, slot);
gDiagWatchBytes[0] = b0;
}
}
gCbRetVal = gCallbacks[slot](gCbParams, paramWords);
fflush(stderr);
// Set return value in DX:AX
frame->eax = (frame->eax & 0xFFFF0000) | (gCbRetVal & 0xFFFF);
frame->edx = (frame->edx & 0xFFFF0000) | (gCbRetVal >> 16);
// Diagnostic: check AFTER callback dispatch
if (gDiagWatchSel != 0) {
uint8_t b0 = _farpeekb(gDiagWatchSel, gDiagWatchOff);
uint8_t b1 = _farpeekb(gDiagWatchSel, gDiagWatchOff + 1);
uint8_t b2 = _farpeekb(gDiagWatchSel, gDiagWatchOff + 2);
if (b0 != gDiagWatchBytes[0] || b1 != gDiagWatchBytes[1] || b2 != gDiagWatchBytes[2]) {
logErr("WATCH-POST: %04X:%08" PRIX32 " changed %02X %02X %02X->%02X %02X %02X after CB[%u]\n",
gDiagWatchSel, gDiagWatchOff,
gDiagWatchBytes[0], gDiagWatchBytes[1], gDiagWatchBytes[2],
b0, b1, b2, slot);
gDiagWatchBytes[0] = b0;
gDiagWatchBytes[1] = b1;
gDiagWatchBytes[2] = b2;
}
}
}
// Defined in the file-scope asm block below
@ -590,11 +545,6 @@ __asm__(
" movw %cx, %fs\n"
" movl %eax, %fs:_gCbSavedFS\n"
// Diagnostic: increment gCbWorkerCalled via FS to prove entry
" movl %fs:_gCbWorkerCalled, %eax\n"
" incl %eax\n"
" movl %eax, %fs:_gCbWorkerCalled\n"
" popl %ecx\n"
" popl %eax\n"
@ -699,20 +649,6 @@ void thunkSetDebug(bool debug)
}
void thunkSetWatch(uint16_t sel, uint32_t off)
{
gDiagWatchSel = sel;
gDiagWatchOff = off;
if (sel != 0) {
gDiagWatchBytes[0] = _farpeekb(sel, off);
gDiagWatchBytes[1] = _farpeekb(sel, off + 1);
gDiagWatchBytes[2] = _farpeekb(sel, off + 2);
logErr("WATCH: set %04X:%08" PRIX32 " = %02X %02X %02X\n",
sel, off, gDiagWatchBytes[0], gDiagWatchBytes[1], gDiagWatchBytes[2]);
}
}
bool thunkInit(ThunkContextT *ctx)
{
memset(ctx, 0, sizeof(ThunkContextT));
@ -748,13 +684,8 @@ bool thunkInit(ThunkContextT *ctx)
uint32_t dosBase = (uint32_t)dosSeg * 16;
logErr("thunk: DOS mem at 0x%05" PRIX32 "-0x%05" PRIX32,
logErr("thunk: DOS mem at 0x%05" PRIX32 "-0x%05" PRIX32 "\n",
dosBase, dosBase + totalSize - 1);
if (0x8134 >= dosBase && 0x8134 < dosBase + totalSize) {
logErr(" ** 0x8134 INSIDE thunk block at offset 0x%04" PRIX32 " **",
(uint32_t)(0x8134 - dosBase));
}
logErr("\n");
// Zero the entire area
{
@ -942,16 +873,6 @@ uint32_t thunkCall16(ThunkContextT *ctx, uint16_t targetSel, uint16_t targetOff,
uint16_t dgroupSel = relayConfig.dgroupSel;
gCbDgroupSel = dgroupSel;
// Diagnostic: check watched byte before lcall
if (gDiagWatchSel != 0) {
uint8_t b = _farpeekb(gDiagWatchSel, gDiagWatchOff);
if (b != gDiagWatchBytes[0]) {
logErr("WATCH-LCALL-PRE: %02X->%02X target=%04X:%04X\n",
gDiagWatchBytes[0], b, targetSel, targetOff);
gDiagWatchBytes[0] = b;
}
}
__asm__ volatile (
// Save ES, GS, and FS
"push %%es\n\t"
@ -980,16 +901,6 @@ uint32_t thunkCall16(ThunkContextT *ctx, uint16_t targetSel, uint16_t targetOff,
: "ebx", "ecx", "edx", "esi", "edi", "memory", "cc"
);
// Diagnostic: check watched byte after lcall returns
if (gDiagWatchSel != 0) {
uint8_t b = _farpeekb(gDiagWatchSel, gDiagWatchOff);
if (b != gDiagWatchBytes[0]) {
logErr("WATCH-LCALL-POST: %02X->%02X target=%04X:%04X\n",
gDiagWatchBytes[0], b, targetSel, targetOff);
gDiagWatchBytes[0] = b;
}
}
return result;
}

View file

@ -85,9 +85,6 @@ bool thunkInit(ThunkContextT *ctx);
// Enable or disable verbose callback tracing.
void thunkSetDebug(bool debug);
// Set a watchpoint on 3 bytes at sel:off. Logs any changes during callbacks.
void thunkSetWatch(uint16_t sel, uint32_t off);
// Shut down the thunking infrastructure and free resources.
void thunkShutdown(ThunkContextT *ctx);

View file

@ -174,11 +174,6 @@ uint32_t gInt10hSavedFS; // Interrupted FS
uint8_t gInt10hStack[4096] __attribute__((aligned(16))); // Handler stack
uint32_t gInt10hStackTop; // Top of handler stack
// Diagnostic: count INT 10h handler entries
volatile uint32_t gInt10hEntryCount = 0;
volatile uint32_t gInt10hLastSS = 0;
volatile uint32_t gInt10hLastESP = 0;
// ============================================================================
// Exception capture - captures primary fault CS:EIP before DJGPP's handler
// (which may itself crash handling exceptions from 16-bit code).
@ -502,15 +497,6 @@ __asm__(
" movw %cx, %fs\n"
" movl %eax, %fs:_gInt10hSavedFS\n"
// Diagnostic: increment entry counter, save SS and ESP
" movl %fs:_gInt10hEntryCount, %eax\n"
" incl %eax\n"
" movl %eax, %fs:_gInt10hEntryCount\n"
" xorl %eax, %eax\n"
" movw %ss, %ax\n"
" movl %eax, %fs:_gInt10hLastSS\n"
" movl %esp, %fs:_gInt10hLastESP\n"
" popl %ecx\n"
" popl %eax\n"
@ -1100,7 +1086,6 @@ WdrvHandleT wdrvLoadDriver(const char *driverPath)
// Load the NE module
if (gDebug) {
extern void neSetDebug(bool enable);
neSetDebug(true);
}
@ -1171,57 +1156,6 @@ WdrvHandleT wdrvLoadDriver(const char *driverPath)
return NULL;
}
// Verify segment integrity after loading
if (drv->ddiEntry[DDI_ORD_ENABLE].present) {
uint16_t codeSel = drv->ddiEntry[DDI_ORD_ENABLE].sel;
uint16_t codeOff = drv->ddiEntry[DDI_ORD_ENABLE].off;
// Find the segment's stored linear address
int segIdx = drv->neMod.exports[DDI_ORD_ENABLE].segIndex - 1;
uint32_t storedLinear = drv->neMod.segments[segIdx].linearAddr;
// Read actual descriptor base from DPMI
uint32_t descBase = 0;
__dpmi_get_segment_base_address(codeSel, (unsigned long *)&descBase);
// Read via flat pointer (linearAddr + offset)
uint8_t *flatPtr = (uint8_t *)(storedLinear + codeOff);
uint8_t flatBytes[8];
for (int i = 0; i < 8; i++) {
flatBytes[i] = flatPtr[i];
}
// Read via far pointer (selector:offset)
uint8_t farBytes[8];
for (int i = 0; i < 8; i++) {
farBytes[i] = _farpeekb(codeSel, codeOff + i);
}
// Read raw 8-byte LDT descriptor
uint8_t rawDesc[8];
__dpmi_get_descriptor(codeSel, rawDesc);
uint32_t ldtBase = (uint32_t)rawDesc[2] | ((uint32_t)rawDesc[3] << 8) |
((uint32_t)rawDesc[4] << 16) | ((uint32_t)rawDesc[7] << 24);
uint32_t ldtLimit = (uint32_t)rawDesc[0] | ((uint32_t)rawDesc[1] << 8) |
((uint32_t)(rawDesc[6] & 0x0F) << 16);
dbg("windrv: dsBase=0x%08X ptrVal=0x%08" PRIX32
" descBase=0x%08" PRIX32 " ldtBase=0x%08" PRIX32
" ldtLimit=0x%05" PRIX32 "\n",
__djgpp_base_address, storedLinear, descBase, ldtBase, ldtLimit);
dbg("windrv: rawDesc: %02X %02X %02X %02X %02X %02X %02X %02X\n",
rawDesc[0], rawDesc[1], rawDesc[2], rawDesc[3],
rawDesc[4], rawDesc[5], rawDesc[6], rawDesc[7]);
dbg("windrv: flat[%p]: %02X %02X %02X %02X %02X %02X %02X %02X\n",
flatPtr,
flatBytes[0], flatBytes[1], flatBytes[2], flatBytes[3],
flatBytes[4], flatBytes[5], flatBytes[6], flatBytes[7]);
dbg("windrv: far[%04X:%04X]: %02X %02X %02X %02X %02X %02X %02X %02X\n",
codeSel, codeOff,
farBytes[0], farBytes[1], farBytes[2], farBytes[3],
farBytes[4], farBytes[5], farBytes[6], farBytes[7]);
}
// Patch DoInt10h's INT 31h -> INT 64h BEFORE calling the entry point.
// The entry point calls SetupInt10h which self-modifies the Code segment
// (patches PUSHAD/POPAD on 386). We patch first so that when the entry
@ -1673,17 +1607,6 @@ int32_t wdrvEnable(WdrvHandleT handle, int32_t width, int32_t height, int32_t bp
handle->enabled = true;
// Watch the area ~0x4B8 bytes before end-of-.text. Corruption
// in VBESVGA consistently zeros a byte near this offset.
{
extern char etext;
uint32_t etextOff = (uint32_t)&etext;
uint32_t watchOff = etextOff - 0x4B8;
dbg("windrv: etext=0x%08" PRIX32 " watch=0x%08" PRIX32 "\n",
etextOff, watchOff);
thunkSetWatch(_my_ds(), watchOff);
}
setError(WDRV_OK);
return WDRV_OK;
}
@ -2207,7 +2130,6 @@ const char *wdrvGetLastErrorString(void)
void wdrvSetDebug(bool enable)
{
gDebug = enable;
extern void neSetDebug(bool enable);
neSetDebug(enable);
thunkSetDebug(enable);
stubSetDebug(enable);