diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..b36f978 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,107 @@ +# Win31drv Project Memory + +## Build Environment +- DJGPP cross-compiler: `~/djgpp/djgpp/bin/i586-pc-msdosdjgpp-gcc` (GCC 12.2.0) +- DJGPP binutils need `libfl.so.2`: stored in `tools/lib/` (Makefiles set LD_LIBRARY_PATH) +- CWSDPMI zip stored in `tools/cwsdpmi.zip` (extracted to bin/ during build) +- DOSBox-X: `flatpak run com.dosbox_x.DOSBox-X` (installed as user flatpak) +- CWSDPMI.EXE in `bin/` directory for DPMI support under DOSBox-X +- Config: `dosbox-x.conf` with S3 Trio64 machine type, 64MB RAM + +## Project Structure +``` +windriver/ +├── Makefile # Top-level: builds demo, calls win31drv/Makefile +├── demo.c # Demo program +├── dosbox-x.conf # DOSBox-X config (S3 SVGA) +├── obj/ # Demo object files +├── bin/ # Executables + CWSDPMI.EXE +└── win31drv/ # Library + ├── Makefile # Builds libwindrv.a + ├── obj/ # Library objects + ├── neload.c/h # NE format loader + ├── neformat.h # NE structures + ├── thunk.c/h # 32→16 bit thunking + ├── windrv.c/h # Main API + ├── winstub.c/h # Windows API stubs + ├── winddi.h # DDI structures + └── wintypes.h # Win16 types +``` + +## DJGPP Portability Notes +- `uint32_t` is `unsigned long` (not `unsigned int`) in DJGPP — use `PRIu32`/`PRIX32` from `` +- Always include `` explicitly for `va_list`/`va_start`/`va_end` +- Headers must be self-contained (include their own dependencies) + +## Thunking Architecture Notes +- **SS == DS == DGROUP**: Win3.x drivers assume SS == DS == DGROUP. VBESVGA's BBLT.ASM does + `PrestoChangeoSelector(SS, WorkSelector)` to create a code alias for compiled blit code. + thunkCall16 uses dgroupSel as SS (SP=0xFFF0) when available. Without this, the code alias + has the wrong base and the CPU executes garbage. +- **Register corruption with -O2 inlining**: When demo.c's demoDrawing is inlined into main, + DJGPP GCC 12.2.0 mishandles callee-saved registers across thunk calls in long functions. + Fix: `__attribute__((noinline))` on demoDrawing. Symptom: handle pointer corrupted to + a ColorInfo return value (e.g. 0xFF0001F6) between Demo 2 and Demo 3. + +## DOSBox-X Driver Notes +- `waitForEngine()`: GP_STAT port 0x9AE8 bit 9 polling — S3 only (gIsS3 guard) +- **S3 detection**: Probe CR30 chip ID register. S3 chips: 0x81-0xE1. ET4000: 0x00. + Only apply S3-specific setup (cursor disable, dispYOffset, setDisplayStart) when isS3=true + AND driver is not VGA-class (1bpp/4planes). +- **Pattern scratch artifact**: S3 driver writes 8x8 dithered brush pattern to VRAM at fixed + position (~(144,1)-(151,8)) during accelerated pattern fills. Fixed by shifting CRTC display + start down 10 scanlines (`dispYOffset`) so the scratch area is off-screen. +- **`-fno-gcse` required for windrv.c**: With -O2 GCSE, stack layout causes issues during + 16-bit driver calls. Only windrv.c needs this. See `WINDRV_CFLAGS` in win31drv/Makefile. +- Output DDI (polylines/rectangles) requires a **physical pen** from RealizeObject, not a + raw LogPen16T. The pen must be in DGROUP (same as brush, drawMode, PDEVICE). +- `wdrvUnloadDriver` does NOT auto-call Disable — caller must handle text mode restore +- `sleep()` hangs under DOSBox-X because BIOS timer ticks don't advance without I/O +- Debug output: `-d` flag enables verbose logging in neload, winstub, thunk, and windrv +- Known issue: mode mismatch HW=800x600 vs GDIINFO=640x480 + +## DGROUP Stack Management +- VGA.DRV ships with DGROUP[0x0A]=0xFFFF (stack bottom = top of segment → no stack). + Its BitBlt prolog calls a stack check function at 0x18CA that compares available stack + against [SS:0x0A]. With 0xFFFF, ALL functions fail immediately (return 0). +- Fix: patch [0x0A] to objBase after extending DGROUP. Only done when original = 0xFFFF. +- S3TRIO.DRV and VBESVGA.DRV have [0x0A]=0x0000 — no patching needed. +- **Do NOT unconditionally overwrite DGROUP offsets 0x00-0x0F** — VBESVGA.DRV stores + driver-specific data there (0x030A at offset 0, 0x01 at offset 4). + +## BitBlt Source Device Rules +- For pattern-only ROPs (PATCOPY=0xF0, BLACKNESS=0x00, WHITENESS=0xFF, etc.), + lpSrcDev must be NULL (0:0) per DDI spec. VGA.DRV rejects non-NULL source for + pattern-only ROPs. S3TRIO.DRV tolerates it but correct behavior is NULL. +- Source dependency check: `ropNeedsSrc = (((rop8 >> 2) ^ rop8) & 0x33) != 0` + +## INT 10h ES Translation +- Different INT 10h function families use different ES:offset registers: + VBE 4Fxx → ES:DI, AH=10h (palette) → ES:DX, AH=11h (fonts) → ES:BP, AH=1Bh → ES:DI +- Only specific AL subfunctions use ES as a buffer pointer; most don't +- Copy sizes must be exact (17 bytes for palette, CX*3 for DAC blocks, etc.) +- Copy direction matters: "Set" = copy-in only, "Read/Get" = copy-out only + +## WINFLAGS Handling +- **WF_80x87 NOT used**: We don't save/restore FPU state across thunk boundaries +- **VGA-class drivers need WF_STANDARD**: VGA.DRV's physical_enable hangs in Enhanced + mode (polls VDD that doesn't exist). Auto-detected after Enable(style=1) returns + 1bpp/4planes GDIINFO → repatch __WINFLAGS in all segments (0x0025→0x0015). +- SVGA drivers (S3TRIO, VBESVGA) use WF_ENHANCED normally + +## ET4000 Driver Notes +- ET4000.DRV from Win 3.x distribution is SZDD-compressed; decompress with `msexpand` + (rename to .DR_ first, output is .DR without the V — rename to .DRV) +- DOSBox-X machine type: `svga_et4000` for ET4000 hardware emulation +- ET4000 is 640x480 8bpp, software-rendered (no accelerator engine in DOSBox-X) +- CR30=0x00 on ET4000 → isS3=false → no S3 engine wait, no display start shift + +## Current Demo Status +- S3TRIO.DRV, VBESVGA.DRV, VGA.DRV, ET4000.DRV all work: Load → Enable → Draw → Disable → Unload +- Demo 1: Fill rectangles (BitBlt) — works +- Demo 2: Pixel patterns (Pixel) — works +- Demo 3: Lines/starburst (Output/Polyline) — works +- Demo 4: Screen-to-screen blit (BitBlt SRCCOPY) — works +- VGA.DRV: 640x480 4-plane 16-color mode; limited color palette but functional +- ET4000.DRV: 640x480 8bpp on svga_et4000; software-only, no hw acceleration +- Drivers stored in `drivers/` directory, copied to `bin/` during build diff --git a/win31drv/Makefile b/win31drv/Makefile index e3ff61b..7d790d0 100644 --- a/win31drv/Makefile +++ b/win31drv/Makefile @@ -8,13 +8,12 @@ CC = $(DJGPP_PREFIX)/bin/i586-pc-msdosdjgpp-gcc AR = $(DJGPP_PREFIX)/bin/i586-pc-msdosdjgpp-ar CFLAGS = -Wall -Wextra -O2 -std=gnu99 -I. -# DOSBox-X's S3 Trio64 emulation corrupts specific memory addresses -# during 16-bit driver calls via thunkCall16. With -O2 GCSE enabled, -# the code layout places stack locals and register spills at addresses -# that overlap the corruption targets, causing wrong values in drawing -# parameters. Disabling GCSE for windrv.c changes the layout enough -# to avoid the overlap. Only windrv.c is affected (it has the drawing -# functions that call thunkCall16 with interleaved parameter setup). +# With -O2 GCSE enabled, GCC's code layout for windrv.c places stack +# locals and register spills at addresses that get corrupted during +# 16-bit driver calls via thunkCall16, causing wrong values in drawing +# parameters. Disabling GCSE changes the layout enough to avoid the +# issue. Only windrv.c is affected (it has the drawing functions that +# call thunkCall16 with interleaved parameter setup). WINDRV_CFLAGS = $(CFLAGS) -fno-gcse # DJGPP binutils need libfl.so.2 which may not be installed system-wide diff --git a/win31drv/neload.h b/win31drv/neload.h index bc93574..f537170 100644 --- a/win31drv/neload.h +++ b/win31drv/neload.h @@ -112,4 +112,7 @@ bool neExtendSegment(NeModuleT *mod, int segIdx, uint32_t extraSize, uint32_t *o // Debug: dump module information to stderr. void neDumpModule(const NeModuleT *mod); +// Enable or disable verbose debug output. +void neSetDebug(bool enable); + #endif // NELOAD_H diff --git a/win31drv/thunk.c b/win31drv/thunk.c index e0413a2..022808f 100644 --- a/win31drv/thunk.c +++ b/win31drv/thunk.c @@ -54,7 +54,6 @@ #include "thunk.h" #include "log.h" -#include // Forward declarations static bool installRelayCode(ThunkContextT *ctx); @@ -419,12 +418,6 @@ static __dpmi_paddr gOldCbVec; static volatile bool gHandlerInstalled = false; static bool gThunkDebug = false; -// Diagnostic: monitor 3 bytes at gDiagWatchSel:gDiagWatchOff for corruption. -// Set gDiagWatchSel nonzero to enable. Logs when bytes change. -static uint16_t gDiagWatchSel = 0; -static uint32_t gDiagWatchOff = 0; -static uint8_t gDiagWatchBytes[3] = {0}; - // Shared area for passing parameters from the interrupt handler static uint16_t gCbParams[THUNK_MAX_PARAMS]; static uint32_t gCbRetVal; @@ -459,19 +452,8 @@ uint8_t gCbStack[16384] __attribute__((aligned(16))); uint32_t gCbStackTop; // Worker function called from the assembly stub. -// Diagnostic: set by cbIntWorker to prove it was called -volatile uint32_t gCbWorkerCalled = 0; -volatile uint32_t gCbWorkerLastSS = 0; -volatile uint32_t gCbWorkerLastESP = 0; -volatile uint32_t gCbWorkerLastSlot = 0xDEAD; - void cbIntWorker(CbFrameT *frame) { - gCbWorkerCalled++; - gCbWorkerLastSS = gCbSavedSS; - gCbWorkerLastESP = gCbSavedESP; - gCbWorkerLastSlot = (uint16_t)frame->ebx; - uint16_t slot = (uint16_t)frame->ebx; if (slot >= gCallbackCount || !gCallbacks[slot]) { @@ -531,39 +513,12 @@ void cbIntWorker(CbFrameT *frame) fflush(stderr); } - // Diagnostic: check BEFORE callback dispatch - if (gDiagWatchSel != 0) { - uint8_t b0 = _farpeekb(gDiagWatchSel, gDiagWatchOff); - if (b0 != gDiagWatchBytes[0]) { - logErr("WATCH-PRE: %04X:%08" PRIX32 " changed %02X->%02X before CB[%u]\n", - gDiagWatchSel, gDiagWatchOff, - gDiagWatchBytes[0], b0, slot); - gDiagWatchBytes[0] = b0; - } - } - gCbRetVal = gCallbacks[slot](gCbParams, paramWords); fflush(stderr); // Set return value in DX:AX frame->eax = (frame->eax & 0xFFFF0000) | (gCbRetVal & 0xFFFF); frame->edx = (frame->edx & 0xFFFF0000) | (gCbRetVal >> 16); - - // Diagnostic: check AFTER callback dispatch - if (gDiagWatchSel != 0) { - uint8_t b0 = _farpeekb(gDiagWatchSel, gDiagWatchOff); - uint8_t b1 = _farpeekb(gDiagWatchSel, gDiagWatchOff + 1); - uint8_t b2 = _farpeekb(gDiagWatchSel, gDiagWatchOff + 2); - if (b0 != gDiagWatchBytes[0] || b1 != gDiagWatchBytes[1] || b2 != gDiagWatchBytes[2]) { - logErr("WATCH-POST: %04X:%08" PRIX32 " changed %02X %02X %02X->%02X %02X %02X after CB[%u]\n", - gDiagWatchSel, gDiagWatchOff, - gDiagWatchBytes[0], gDiagWatchBytes[1], gDiagWatchBytes[2], - b0, b1, b2, slot); - gDiagWatchBytes[0] = b0; - gDiagWatchBytes[1] = b1; - gDiagWatchBytes[2] = b2; - } - } } // Defined in the file-scope asm block below @@ -590,11 +545,6 @@ __asm__( " movw %cx, %fs\n" " movl %eax, %fs:_gCbSavedFS\n" - // Diagnostic: increment gCbWorkerCalled via FS to prove entry - " movl %fs:_gCbWorkerCalled, %eax\n" - " incl %eax\n" - " movl %eax, %fs:_gCbWorkerCalled\n" - " popl %ecx\n" " popl %eax\n" @@ -699,20 +649,6 @@ void thunkSetDebug(bool debug) } -void thunkSetWatch(uint16_t sel, uint32_t off) -{ - gDiagWatchSel = sel; - gDiagWatchOff = off; - if (sel != 0) { - gDiagWatchBytes[0] = _farpeekb(sel, off); - gDiagWatchBytes[1] = _farpeekb(sel, off + 1); - gDiagWatchBytes[2] = _farpeekb(sel, off + 2); - logErr("WATCH: set %04X:%08" PRIX32 " = %02X %02X %02X\n", - sel, off, gDiagWatchBytes[0], gDiagWatchBytes[1], gDiagWatchBytes[2]); - } -} - - bool thunkInit(ThunkContextT *ctx) { memset(ctx, 0, sizeof(ThunkContextT)); @@ -748,13 +684,8 @@ bool thunkInit(ThunkContextT *ctx) uint32_t dosBase = (uint32_t)dosSeg * 16; - logErr("thunk: DOS mem at 0x%05" PRIX32 "-0x%05" PRIX32, + logErr("thunk: DOS mem at 0x%05" PRIX32 "-0x%05" PRIX32 "\n", dosBase, dosBase + totalSize - 1); - if (0x8134 >= dosBase && 0x8134 < dosBase + totalSize) { - logErr(" ** 0x8134 INSIDE thunk block at offset 0x%04" PRIX32 " **", - (uint32_t)(0x8134 - dosBase)); - } - logErr("\n"); // Zero the entire area { @@ -942,16 +873,6 @@ uint32_t thunkCall16(ThunkContextT *ctx, uint16_t targetSel, uint16_t targetOff, uint16_t dgroupSel = relayConfig.dgroupSel; gCbDgroupSel = dgroupSel; - // Diagnostic: check watched byte before lcall - if (gDiagWatchSel != 0) { - uint8_t b = _farpeekb(gDiagWatchSel, gDiagWatchOff); - if (b != gDiagWatchBytes[0]) { - logErr("WATCH-LCALL-PRE: %02X->%02X target=%04X:%04X\n", - gDiagWatchBytes[0], b, targetSel, targetOff); - gDiagWatchBytes[0] = b; - } - } - __asm__ volatile ( // Save ES, GS, and FS "push %%es\n\t" @@ -980,16 +901,6 @@ uint32_t thunkCall16(ThunkContextT *ctx, uint16_t targetSel, uint16_t targetOff, : "ebx", "ecx", "edx", "esi", "edi", "memory", "cc" ); - // Diagnostic: check watched byte after lcall returns - if (gDiagWatchSel != 0) { - uint8_t b = _farpeekb(gDiagWatchSel, gDiagWatchOff); - if (b != gDiagWatchBytes[0]) { - logErr("WATCH-LCALL-POST: %02X->%02X target=%04X:%04X\n", - gDiagWatchBytes[0], b, targetSel, targetOff); - gDiagWatchBytes[0] = b; - } - } - return result; } diff --git a/win31drv/thunk.h b/win31drv/thunk.h index f6c4131..0826c0a 100644 --- a/win31drv/thunk.h +++ b/win31drv/thunk.h @@ -85,9 +85,6 @@ bool thunkInit(ThunkContextT *ctx); // Enable or disable verbose callback tracing. void thunkSetDebug(bool debug); -// Set a watchpoint on 3 bytes at sel:off. Logs any changes during callbacks. -void thunkSetWatch(uint16_t sel, uint32_t off); - // Shut down the thunking infrastructure and free resources. void thunkShutdown(ThunkContextT *ctx); diff --git a/win31drv/windrv.c b/win31drv/windrv.c index c6aadf3..2aeb09a 100644 --- a/win31drv/windrv.c +++ b/win31drv/windrv.c @@ -174,11 +174,6 @@ uint32_t gInt10hSavedFS; // Interrupted FS uint8_t gInt10hStack[4096] __attribute__((aligned(16))); // Handler stack uint32_t gInt10hStackTop; // Top of handler stack -// Diagnostic: count INT 10h handler entries -volatile uint32_t gInt10hEntryCount = 0; -volatile uint32_t gInt10hLastSS = 0; -volatile uint32_t gInt10hLastESP = 0; - // ============================================================================ // Exception capture - captures primary fault CS:EIP before DJGPP's handler // (which may itself crash handling exceptions from 16-bit code). @@ -502,15 +497,6 @@ __asm__( " movw %cx, %fs\n" " movl %eax, %fs:_gInt10hSavedFS\n" - // Diagnostic: increment entry counter, save SS and ESP - " movl %fs:_gInt10hEntryCount, %eax\n" - " incl %eax\n" - " movl %eax, %fs:_gInt10hEntryCount\n" - " xorl %eax, %eax\n" - " movw %ss, %ax\n" - " movl %eax, %fs:_gInt10hLastSS\n" - " movl %esp, %fs:_gInt10hLastESP\n" - " popl %ecx\n" " popl %eax\n" @@ -1100,7 +1086,6 @@ WdrvHandleT wdrvLoadDriver(const char *driverPath) // Load the NE module if (gDebug) { - extern void neSetDebug(bool enable); neSetDebug(true); } @@ -1171,57 +1156,6 @@ WdrvHandleT wdrvLoadDriver(const char *driverPath) return NULL; } - // Verify segment integrity after loading - if (drv->ddiEntry[DDI_ORD_ENABLE].present) { - uint16_t codeSel = drv->ddiEntry[DDI_ORD_ENABLE].sel; - uint16_t codeOff = drv->ddiEntry[DDI_ORD_ENABLE].off; - - // Find the segment's stored linear address - int segIdx = drv->neMod.exports[DDI_ORD_ENABLE].segIndex - 1; - uint32_t storedLinear = drv->neMod.segments[segIdx].linearAddr; - - // Read actual descriptor base from DPMI - uint32_t descBase = 0; - __dpmi_get_segment_base_address(codeSel, (unsigned long *)&descBase); - - // Read via flat pointer (linearAddr + offset) - uint8_t *flatPtr = (uint8_t *)(storedLinear + codeOff); - uint8_t flatBytes[8]; - for (int i = 0; i < 8; i++) { - flatBytes[i] = flatPtr[i]; - } - - // Read via far pointer (selector:offset) - uint8_t farBytes[8]; - for (int i = 0; i < 8; i++) { - farBytes[i] = _farpeekb(codeSel, codeOff + i); - } - - // Read raw 8-byte LDT descriptor - uint8_t rawDesc[8]; - __dpmi_get_descriptor(codeSel, rawDesc); - uint32_t ldtBase = (uint32_t)rawDesc[2] | ((uint32_t)rawDesc[3] << 8) | - ((uint32_t)rawDesc[4] << 16) | ((uint32_t)rawDesc[7] << 24); - uint32_t ldtLimit = (uint32_t)rawDesc[0] | ((uint32_t)rawDesc[1] << 8) | - ((uint32_t)(rawDesc[6] & 0x0F) << 16); - - dbg("windrv: dsBase=0x%08X ptrVal=0x%08" PRIX32 - " descBase=0x%08" PRIX32 " ldtBase=0x%08" PRIX32 - " ldtLimit=0x%05" PRIX32 "\n", - __djgpp_base_address, storedLinear, descBase, ldtBase, ldtLimit); - dbg("windrv: rawDesc: %02X %02X %02X %02X %02X %02X %02X %02X\n", - rawDesc[0], rawDesc[1], rawDesc[2], rawDesc[3], - rawDesc[4], rawDesc[5], rawDesc[6], rawDesc[7]); - dbg("windrv: flat[%p]: %02X %02X %02X %02X %02X %02X %02X %02X\n", - flatPtr, - flatBytes[0], flatBytes[1], flatBytes[2], flatBytes[3], - flatBytes[4], flatBytes[5], flatBytes[6], flatBytes[7]); - dbg("windrv: far[%04X:%04X]: %02X %02X %02X %02X %02X %02X %02X %02X\n", - codeSel, codeOff, - farBytes[0], farBytes[1], farBytes[2], farBytes[3], - farBytes[4], farBytes[5], farBytes[6], farBytes[7]); - } - // Patch DoInt10h's INT 31h -> INT 64h BEFORE calling the entry point. // The entry point calls SetupInt10h which self-modifies the Code segment // (patches PUSHAD/POPAD on 386). We patch first so that when the entry @@ -1673,17 +1607,6 @@ int32_t wdrvEnable(WdrvHandleT handle, int32_t width, int32_t height, int32_t bp handle->enabled = true; - // Watch the area ~0x4B8 bytes before end-of-.text. Corruption - // in VBESVGA consistently zeros a byte near this offset. - { - extern char etext; - uint32_t etextOff = (uint32_t)&etext; - uint32_t watchOff = etextOff - 0x4B8; - dbg("windrv: etext=0x%08" PRIX32 " watch=0x%08" PRIX32 "\n", - etextOff, watchOff); - thunkSetWatch(_my_ds(), watchOff); - } - setError(WDRV_OK); return WDRV_OK; } @@ -2207,7 +2130,6 @@ const char *wdrvGetLastErrorString(void) void wdrvSetDebug(bool enable) { gDebug = enable; - extern void neSetDebug(bool enable); neSetDebug(enable); thunkSetDebug(enable); stubSetDebug(enable);