calog/design.md

57 KiB

calog -- Polyglot Script Broker: Design & Implementation Plan

A C "broker" that lets one application be written in a mix of scripting languages (Lua and my-basic first; Squirrel and others later). Native C functions are added once and become callable from every language. Functions and data exported from one module are callable from modules written in another language. Threading is actor-based; networking rides the same dispatcher. Data sharing is by-value for v1.

This document is the reconciled output of a design pass plus an adversarial verification pass. Where the verification corrected the first-cut design, the correction is folded in and noted as "[verified]".


1. Architecture: hub and spoke

Nothing talks to anything else directly. Every engine talks only to the broker, through two shared contracts:

  1. One universal value type, ValueT (a tagged union).
  2. One uniform native-function signature:
typedef int32_t (*NativeFnT)(ValueT *args, int32_t argCount, ValueT *result, void *userData);

A developer writes a native function once against that signature and registers it once. A script function exported from a module is itself stored as a NativeFnT whose body re-enters its owning interpreter -- so "call C from script", "call script from C", and "call module A's function from module B" are all the same code path. Adding an engine is O(1) adapter work, not O(N) per existing engine.

The single most important lesson from the verification pass: there must be exactly one ValueT / AggregateT / ValueTypeE, defined in broker.h, included verbatim by every adapter. The first-cut design had three divergent copies and would not have linked, let alone round-tripped data. Section 2 is therefore the load-bearing part of this plan.


2. The canonical type system (broker.h) -- single source of truth

2.1 Value tags and the value struct

typedef enum ValueTypeE {
    valueNilE       = 0,
    valueBoolE      = 1,
    valueIntE       = 2,   // int64_t
    valueRealE      = 3,   // double
    valueStringE    = 4,   // length-prefixed, binary-safe
    valueAggregateE = 5,   // hybrid array + map container
    valueFnE        = 6    // function value: refcounted handle to a CallableT
} ValueTypeE;


typedef struct StringT {
    char    *bytes;    // owned; always NUL-terminated at bytes[length] for C consumers
    int64_t  length;   // byte count excluding the convenience terminator (binary-safe)
} StringT;


typedef struct ValueT {
    ValueTypeE type;
    union {
        bool             b;
        int64_t          i;
        double           r;
        StringT          s;
        struct AggregateT *agg;       // heap-owned subtree
        struct CallableT  *callable;  // refcounted, broker-owned
    } as;
} ValueT;

2.2 The aggregate (one shape both engines map onto)

typedef struct PairT {
    ValueT key;     // marshal layer constrains to int/real/string keys (see 2.5)
    ValueT value;
} PairT;


typedef enum AggregateKindE {
    aggregateListE = 0,   // empty container round-trips as a list by default
    aggregateMapE  = 1,
    aggregateBothE = 2    // array part AND pairs part populated
} AggregateKindE;


typedef struct AggregateT {
    AggregateKindE  kind;        // disambiguates empty/mixed containers across engines
    ValueT         *array;       // dense elements [0, arrayCount)
    int64_t         arrayCount;
    int64_t         arrayCap;
    PairT          *pairs;       // map part; preserves insertion order
    int64_t         pairCount;
    int64_t         pairCap;
} AggregateT;

A Lua table's sequence part maps to array, its remaining keys to pairs. A my-basic LIST maps to array, a DICT maps to pairs. The explicit kind flag fixes two problems the verifier flagged: an empty {} had no defined type on the far side, and a mixed array+hash Lua table had no representation at all.

2.3 Function values: CallableT

valueFnE is the one deliberate exception to "by-value everything". A function cannot be meaningfully copied between heaps, so it is shared by reference -- but safely, because it is only ever invoked, never inspected, and invocation always routes to the owning context's thread.

typedef struct CallableT {
    NativeFnT  fn;            // uniform invoke entry
    void      *userData;      // luaL_ref slot, pinned mb_value_t, or C closure ctx
    uint32_t   ownerCtxId;    // context whose thread MUST run it
    uint32_t   ownerGen;      // generation of that context (UAF guard, see sec 9)
    int32_t    refCount;      // ATOMIC; shared-handle lifetime across threads
    bool       alive;         // false once the owning context is torn down
} CallableT;

Rules (these resolve the verifier's critical "function value breaks the no-shared-pointer invariant" finding):

  • valueCopy of a valueFnE does an atomic refcount increment on the same CallableT -- it does NOT clone the closure. The shared CallableT* across threads is allowed precisely because refCount is atomic and the closure is touched only on its owner thread.
  • valueFree of a valueFnE does an atomic decrement. When it hits zero, the underlying closure must be released with an interpreter call (luaL_unref, my-basic unref) -- which can only run on the owner's thread. So a zero-drop on a foreign thread posts a release message to the owner context rather than calling the interpreter directly (sec 10).
  • Invoking a dead handle (alive == false, owner gone) returns a clean broker error, never a call into a freed interpreter.

2.4 Value operation contract (one signature, used everywhere)

The verifier caught two designs declaring valueCopy with incompatible signatures. The canonical form -- status-returning, dst-by-pointer, so OOM is checkable on the hot path:

int32_t valueCopy(ValueT *dst, const ValueT *src);   // deep copy; brokerOkE / brokerErrOomE
void    valueFree(ValueT *v);                          // recursive free; leaves a safe nil
void    valueMove(ValueT *dst, ValueT *src);           // zero-alloc ownership transfer

valueFree and valueCopy MUST have a default:/explicit case for every tag including valueFnE -- a missing case was how the first cut silently leaked function payloads.

2.5 Cross-engine fidelity table (the honest limits)

By-value marshalling across Lua <-> broker <-> my-basic is lossless for the common cases and lossy at documented edges. These are inherent to my-basic's value model (verified against my_basic.h/.c), not marshalling bugs:

Aspect Lua my-basic Crossing Lua <-> BASIC
integers 64-bit int_t == int (32-bit, always) truncates above 2^31 -- range-check + error
reals double float (double w/ -DMB_DOUBLE_FLOAT) precision loss unless double build
int vs real subtype distinct (5.4) integral real auto-collapses to int subtype not preserved through BASIC
strings binary-safe (length) bare char* (strlen) embedded NUL truncates -- detect + error
array / list table sequence part LIST OK
map / dict table hash part DICT (int/real/string keys) OK
mixed array+hash table one table no single LIST+DICT value collapses to DICT (array part -> int keys), documented
empty {} table LIST or DICT? kind flag; default LIST
function value closure (luaL_ref) lambda/routine (MB_DT_ROUTINE) OK via CallableT (by-reference)
nested depth bounded bounded one shared cap; defined error past it
cycles rejected on ingress rejected on ingress impossible at rest (by-value, no shared refs)

Policy decisions baked in to make the above deterministic:

  • One shared recursion-depth cap applied on every recursive path -- both ingress and egress, all components -- failing with a defined status instead of overflowing the C stack. (The first cut bounded only Lua ingress.)
  • One strict/lenient switch, owned by the broker and honored by both adapters: in strict mode an unrepresentable value/key (e.g. a function used as a table key, a 64-bit int into BASIC, a NUL-bearing string into BASIC) is an error; in lenient mode it is dropped/coerced with a documented rule. Never a silent surprise either way.
  • Keys: broker allows int/real/string keys (my-basic dicts accept all three). Float keys get defined equality; non-representable keys follow the strict/lenient switch.

3. The engine vtable -- what makes the actor loop engine-agnostic

Each adapter implements one small vtable so the broker/actor core never special-cases an engine:

typedef struct EngineT {
    const char *name;
    void   *(*createInterpreter)(struct ScriptContextT *ctx);   // ON the owning thread
    void    (*destroyInterpreter)(void *interp);
    int32_t (*loadSource)(void *interp, const char *src, int64_t len);
    int32_t (*registerNative)(void *interp, const char *name, NativeFnT fn, void *userData);
    int32_t (*callExport)(void *interp, void *exportRef, ValueT *args, int32_t argCount, ValueT *result);
    void    (*releaseExport)(void *interp, void *exportRef);     // ON the owning thread
} EngineT;

createInterpreter, destroyInterpreter, releaseExport, and every callExport run on the context's own thread -- that is the invariant that keeps each interpreter single-threaded.


4. The Lua adapter

Target Lua 5.4 (note 5.1/5.3 deltas where they matter). The C API surface was verified accurate; the fixes below are about lifecycle, not API names.

4.1 Native registration and the trampoline

Each registered NativeFnT becomes a Lua C closure: push the NativeFnT and its userData as upvalues with lua_pushcclosure, then lua_setglobal (or into a module table). The single trampoline recovers them from upvalues, marshals the Lua stack into a ValueT[], calls the NativeFnT, and pushes the result back.

  • lua_setglobal returns void (the first cut documented int -- harmless but wrong).
  • Lua allocation APIs (lua_newuserdatauv, lua_createtable, ...) longjmp on OOM and never return NULL -- so NULL-checks after them are dead code; the OOM path is a Lua error, not a C return. Only luaL_newstate can return NULL and must be checked.

4.2 Marshalling ValueT <-> Lua (by value)

Scalars are direct. Strings use lua_tolstring + length (binary-safe, preserves NULs). Tables deep-copy in both directions:

  • Ingress (table -> AggregateT): normalize the table index to absolute before lua_next; route numeric keys through lua_tointeger/lua_tonumber (do not let lua_tolstring mutate a numeric key in place); fill array for the sequence part and pairs for the rest; detect cycles via an ancestor-pointer stack (lua_topointer); use lua_rawset/ lua_rawget to avoid metamethods; enforce the shared depth cap.
  • Egress (AggregateT -> table): build with lua_createtable, populate array as the sequence and pairs as keyed entries, with the same depth cap (the first cut had no egress cap -- a deep BASIC-origin structure could overflow the stack on the way back). Make the builder self-balancing: record lua_gettop on entry and lua_settop back on any error return.

4.3 Exporting a Lua function (and the leak fix)

A Lua function crossing the boundary becomes a valueFnE: pin the closure with luaL_ref(L, LUA_REGISTRYINDEX), wrap it in a CallableT. The fn body does lua_rawgeti to retrieve, marshals args onto the stack, lua_pcall, marshals the return; on error it pulls the message with luaL_tolstring and reports it as a broker error (sec 8).

Two bugs the verifier found, fixed here:

  • The exports array needs grow-on-demand. The context is calloc'd, so the array starts NULL/0; the first export must realloc (double the cap, handle the NULL/0 seed) before storing, and luaL_unref the just-created ref if the realloc fails.
  • Transient vs persistent ownership. Every Lua function passed as a native argument was creating a luaL_ref that only got released at lua_close -- an unbounded leak for any long-lived context that takes callbacks. Fix: the CallableT refcount owns the luaL_ref. When the last valueFree drops the handle to zero, the ref is released (on the Lua thread per sec 10). A function the broker retains as a real export holds a reference for as long as it is registered; a function merely borrowed for the duration of one call is released when that call's ValueT args are freed. Same mechanism, two lifetimes.

4.4 Calling a function value from Lua

A valueFnE marshalled into Lua becomes a Lua C closure over the CallableT*, so script authors just write cb(x, y) and it transparently dispatches (to the owner's thread if the function lives elsewhere). A universal call(fn, ...) native is also provided for uniformity across engines.

4.5 Context lifecycle

luaL_newstate + luaL_openlibs on the owning thread; confine the lua_State to that thread forever; teardown luaL_unrefs outstanding refs then lua_close.


5. The my-basic adapter

Verified against paladin-t/my_basic (my_basic.h + .c read directly). Zero hallucinated calls. The interesting work is three my-basic-specific quirks that shape the adapter; all three are forced by the source, not stylistic.

5.1 Lifecycle and the inverted register result

mb_init (once per process) / mb_open(&bas) / mb_load_string(bas, src, true) / mb_run / mb_close / mb_dispose. The broker pointer is threaded through the interp's single userdata slot via mb_set_userdata / mb_get_userdata.

  • mb_register_func returns a count, not a status -- nonzero means registered, 0 means duplicate/failure. That is the opposite of the MB_FUNC_OK == 0 convention, so the success test must be inverted. Names are uppercased internally (mb_strupr), so the broker key is the uppercased identifier (BASIC is case-insensitive).

5.2 The native-function protocol and the trampoline bank

The native signature is typedef int (*mb_func_t)(struct mb_interpreter_t*, void**); -- no per-callback userData parameter, and the interpreter has only one userdata slot. So a single shared C trampoline cannot tell which broker function it is serving.

Fix (the verifier confirmed this limitation is real): a macro-generated bank of trampolines mbTramp0 .. mbTrampN, each hardcoding its slot index, each looking up ctx->nativeBank[slot] (the NativeFnT + userData) via the interpreter's userdata pointer. The bank size caps how many natives one my-basic context can host; size it generously and document it.

Inside a trampoline the argument protocol is the real my-basic frame dance: mb_attempt_open_bracket / loop mb_pop_value (honoring mb_has_arg) / mb_attempt_close_bracket / compute / mb_push_value (or the typed mb_push_*).

5.3 String ownership (memdup is mandatory)

mb_pop_string hands back a borrowed interior pointer -- the broker must strdup/copy it immediately. Pushed strings are taken over by the interpreter and later freed with its allocator, so any string handed to mb_push_value/mb_make_string must come from mb_memdup (not plain malloc). Embedded NULs cannot survive (bare char* + strlen) -- enforce the strict/lenient policy on egress.

5.4 Aggregates: the collection API

There is no mb_make_coll. A list/dict is built by presetting coll->type = MB_DT_LIST/MB_DT_DICT then calling mb_init_coll, and accessed with mb_get_coll / mb_set_coll / mb_remove_coll / mb_count_coll / mb_keys_of_coll. Collection support is on by default (MB_ENABLE_COLLECTION_LIB). A broker aggregate with both array and pairs populated collapses to a DICT (array part becomes integer keys) per the fidelity table.

5.5 Exporting a BASIC routine -- the parked __BROKERSERVE frame

This is the my-basic-specific crux. To call a BASIC routine/lambda from C you use mb_get_routine(s, l, name, &val) then mb_eval_routine(s, l, val, args, argc, &ret) -- and mb_eval_routine dereferences *l and hard-requires a live, non-NULL void** l (verified at my_basic.c:14344/14358). A valid l only exists inside a running native call. Therefore a my-basic context cannot be driven from arbitrary C; it must be parked inside a native frame.

Design: register a native __BROKERSERVE whose C body is the context's message-pump / serve loop. A module hands control to the broker by ending with a SERVE call (the adapter appends one if absent). While parked there, the loop holds a valid l, which it uses to mb_eval_routine whenever another context calls one of this module's exported routines. mb_get_routine returns MB_FUNC_OK with a nil value when a name is absent, so the not-found test is routine.type != MB_DT_ROUTINE, not the status code.

5.6 Numeric and identity caveats

int_t is 32-bit unconditionally (64-bit broker ints truncate -- range-check + error or promote to real with documented precision loss); integral reals auto-collapse to int so real/int subtype is not preserved across a BASIC hop. Both are in the fidelity table; both follow the strict/lenient switch.


6. Threading: the actor model

Each ScriptContextT owns one interpreter, one OS thread (pthreads -- chosen over C11 <threads.h> for portability/maturity), and one inbound MPSC message queue. Interpreters are single-threaded; only the owning thread ever enters callExport. A cross-context call is a message; the caller blocks for the reply on a per-call condvar future (lost-wakeup safe via a predicate loop). The verifier confirmed the core is sound: no path lets two threads touch one interpreter, the deep-copy ownership ledger is correct on the success path, and the epoll thread enqueuing while a context is mid-dispatch is race-free.

The fixes folded in from verification:

  • One error channel. The first cut carried a separate error ValueT in the reply that the caller never freed -- a leak on every errored call, lost error text, and a second source of truth contradicting the broker's "error travels in result" contract. Fix: on failure the adapter writes the error string into result (as brokerSetError does); the reply carries only {status, result}. One channel, one owner, freed once.
  • valueCopy checked on enqueue. Use the canonical int32_t valueCopy(dst, src), check each arg, and unwind partially-copied args on OOM (mirroring the broker route path). The actor layer should call the broker's marshalling, not reimplement a copy loop.
  • Shutdown drains everything. On SHUTDOWN, error-reply every queued CALL and free every queued/stashed REPLY (result + error) before join -- the first cut leaked in-flight replies unwound by a nested shutdown.
  • Explicit thread stack size. The reentrancy depth bound counts dispatch nesting, not C-stack bytes; set a validated stack size with pthread_attr_setstacksize (or lower the bound) so the "clean catchable depth error" promise actually holds instead of a UB overflow.
  • Split the ready-handshake condvar off the queue condvar so queueCond has exactly one semantic (latent lost-wakeup footgun if a second waiter is ever added).

6.1 What a context does while blocked: always-live nested pump [DECIDED]

When context A makes a synchronous cross-context call and waits for the reply, A's thread pumps its own inbound queue instead of sleeping idle. An incoming call to A -- including a re-entrant B->A issued during the very call A is waiting on -- is serviced on A's own thread, then A resumes waiting. This was chosen over strict run-to-completion because it never deadlocks and needs no wait-for-graph deadlock detector; the rejected alternative would have had to raise a "synchronous call cycle" error on A->B->A and would leave a busy context unresponsive to other callers. The verifier validated the pump as sound: only A's thread ever enters A's interpreter (the single-threaded invariant holds), reply nesting is strict LIFO, and depth is bounded.

The contract this commits the runtime and script authors to:

  • Re-entry happens only at explicit cross-context call points (x = getUserInfo(), data = sockRecv(c)), never mid-statement -- a call point is a yield point.
  • Module-global state may differ after a cross-context call returns, because another call may have run on this context while it was outstanding (the same contract as any RPC). Local variables are unaffected.
  • Reentrancy is depth-bounded with a catchable error (backed by an explicit pthread stack size, sec 6 fixes), so runaway ping-pong fails cleanly instead of overflowing.

Script code stays plain synchronous-blocking regardless -- info = getUserInfo() just works; this only governs what the runtime does while a call is outstanding.


7. Networking and the dispatcher

One dedicated I/O thread runs epoll (Linux; kqueue/poll for portability) and owns no interpreter. Async socket primitives (sockConnect, sockListen, sockSend, sockRecv, sockClose, plus a timer) are registered once through the broker, so every language gets them. The recommended v1 model is synchronous-blocking at the script level: data = sockRecv(conn) parks the calling context on a reply future; when epoll reports readiness, the I/O thread builds a CALL/reply and enqueues it onto the owning context's queue, so the result lands on the right interpreter thread. Callbacks are opt-in on top: pass a valueFnE (e.g. onConnect(myFunc)) and the completion invokes it via the same dispatch, always back on its home thread. No separate async keyword, no per-engine coroutine support needed.

Fixes from verification:

  • The I/O command queue must be strict tail-append FIFO (a sockSend issued right after sockConnect must be processed after the connect that registered the handle); assert it.
  • The resolver/connect path must deep-copy host before the command is freed (the first cut had an unconditional free(cmd->host) that would UAF if stored by pointer).
  • Wake the epoll thread for new interest via eventfd/self-pipe; deregister on close; drain eventfd to EAGAIN.
  • Portability note: pthread_condattr_setclock(CLOCK_MONOTONIC) is absent on Darwin -- guard it with #ifdef and derive any monotonic timed wait accordingly (a monotonic deadline cannot be handed to a realtime-clock condvar).

8. Error model (one source of truth)

A NativeFnT returns a status int and, on failure, writes a human-readable message into result (a valueStringE tagged as an error, or a small error-struct convention). That single in-band channel crosses the actor boundary unchanged, is freed exactly once by the caller, and is surfaced into the calling engine as that engine's native error (luaL_error/lua_error for Lua, mb_raise_error for my-basic). There is no second error field anywhere.


9. Context lifetime and the registry (UAF fix)

The critical use-after-free: contexts were addressed by raw ScriptContextT* (and exports held raw owner pointers), while contextShutdown frees the context and destroys its mutex/cond at runtime -- so a foreign thread could enqueue onto a freed queueMutex.

Fix:

  • Address contexts by a stable integer id through a locked registry; never by raw pointer. contextEnqueue/contextCall/ioDispatch resolve id -> context under the registry lock and either hold the lock across enqueue or take a reference so the context (and its mutex) cannot be freed mid-enqueue.
  • Add a generation counter to context ids and to CallableT.ownerGen so a recycled id cannot misroute an in-flight completion to a different context.
  • contextShutdown: under the lock, mark dead and remove from the id map; reject new enqueues with a defined "dead context" error; drain and error-reply queued work; wait for in-flight references to drain; then free.

10. Function-value lifecycle across threads

CallableT.refCount is atomic. valueCopy bumps it; valueFree drops it. The subtlety: releasing the underlying closure is an interpreter op (luaL_unref / my-basic unref) that must run on the owner's thread. So when a drop reaches zero on a foreign thread, the broker posts a release message to the owner context instead of touching the interpreter directly; the owner releases the closure on its own thread and frees the CallableT. If the owner is already gone (alive == false), the CallableT shell is freed directly (the closure is already gone with the interpreter) and any pending invoke returns a clean error.


11. Build order

  1. Broker core: broker.h (the canonical ValueT/AggregateT/CallableT/enums), valueCopy/valueFree/valueMove with full tag coverage and the depth cap, the name->entry registry, brokerCall, the error convention. Unit-test value round-trips and deep-copy/free under a leak checker before any engine exists.
  2. Lua adapter against the core: trampoline, scalar+string marshalling, table deep-copy both directions with caps, native registration, export with the refcounted luaL_ref lifecycle and exports-array growth. Test C<->Lua and Lua-export-called-from-C single-threaded.
  3. my-basic adapter: lifecycle, the trampoline bank, the arg-frame protocol, mb_memdup string ownership, the collection mapping, and the parked __BROKERSERVE export frame. Test C<->BASIC and the full Lua<->broker<->BASIC round-trip against the fidelity table (assert the lossy edges error or coerce exactly as documented).
  4. Actor layer: ScriptContextT, the MPSC queue, the reply future, the id+generation registry, the single error channel, the chosen block-while-waiting semantics (sec 6.1), and shutdown drain. Stress cross-context calls and teardown under a thread sanitizer.
  5. Networking/dispatcher: epoll I/O thread, the FIFO command queue, the socket/timer natives, completion dispatch onto owning queues, callbacks via valueFnE.
  6. Squirrel (later): a third adapter validates that the vtable + canonical ValueT really make new engines O(1).

12. Open decisions

  • Block-while-waiting semantics: DECIDED -- always-live nested pump (sec 6.1).
  • Strict-vs-lenient default for the lossy marshal edges (recommend: strict by default so truncation/loss is an explicit error; lenient opt-in per call).
  • my-basic native-bank size (cap on natives per BASIC context).
  • Whether a foreign function injected into BASIC should be transparently callable as a routine value (cb(x)) or only via the portable CALL(fn, ...) primitive (Lua gets the transparent form for free; BASIC's transparent form needs confirming).

13. Implementation notes (as-built: broker core + both adapters)

Built and tested: broker.h/value.c/broker.c (core), luaAdapter.* (Lua 5.4), mybasicAdapter.* (vendored my-basic in vendor/), with testBroker/testLua/ testMyBasic/testPolyglot -- 378 checks, clean under ASan+UBSan. The polyglot test proves the thesis: one C native called from both engines, and a Lua function invoked from a BASIC program through the broker.

Core refinement. The single global callable-release hook could not distinguish a Lua closure from a my-basic routine, so release is now a per-callable CallableReleaseFnT passed to callableCreate (design sec 10's "owner releases the closure", just synchronous for now). Added callableUserData so a release fn can reach its closure handle.

Lua adapter. Context pointer lives in lua_getextraspace. Native bindings are context-owned {fn,userData} structs referenced by a light-userdata upvalue on one shared trampoline. A Lua function crossing out becomes a CallableT over a pinned luaL_ref (released via luaL_unref in the per-callable release fn); transient callback args are freed automatically because valueFree drops the handle. A CallableT crossing in becomes a callable userdata with __call/__gc. Lua allocation APIs longjmp on OOM (no NULL checks). Caveat: release exported callables before luaContextDestroy (the luaL_ref lives in that state's registry).

my-basic adapter (the high-effort one; these rules were forced by ASan):

  • Build with -DMB_DOUBLE_FLOAT (double reals) and link -lm.
  • Native signature has no per-call userData and one interpreter userdata slot, so a macro-generated trampoline bank (MB_BANK_SIZE) supplies slot-specific entries that recover the binding from the context.
  • mb_register_func returns a count: nonzero = success, 0 = failure (inverted vs the usual MB_FUNC_OK == 0).
  • Ownership is asymmetric and was the main source of bugs (verified against the my-basic source during adversarial review):
    • A popped collection is owned by the consumer (mb_dispose_value after marshalling); a popped string is a borrowed interior pointer (copy, never free).
    • mb_set_coll copies a scalar/string key-value (dispose your copy after) but stores a collection by pointer without a reference -- so a nested collection needs an explicit mb_ref_value before the set, and must then NOT be disposed (the parent owns it).
    • mb_push_value transfers a collection, but borrows a string -- a string result must be pushed with mb_push_string (which marks it for lazy destroy), not mb_push_value.
    • mb_eval_routine borrows its arguments (it never frees them), so marshalled routine args are disposed by the caller after the call -- and the return value is marshalled out first, because a routine may return one of those borrowed arguments.
    • Routine values are not ref-counted (mb_ref_value/mb_unref_value corrupt them); a routine name must be uppercased before mb_get_routine (BASIC uppercases at parse).
    • int64 entering BASIC is range-checked to 32-bit int_t (brokerErrRangeE on overflow).
  • Routine export uses mb_get_routine(by name) + mb_eval_routine, both of which need a live void** l. That cursor only exists inside a native call, so the dispatch stashes it in currentL; an exported BASIC-routine CallableT is therefore valid only while the context is serving (a native frame is on the stack -- what the actor layer's parked __BROKERSERVE frame will guarantee). For now, fetch and invoke within one native call.
  • One interpreter per program: a my-basic context hosts a single program; reset+reload after disposing native-pushed collection intermediates is unreliable, so the tests spin a fresh context per run. (The actor layer will own one long-lived parked context per module, which sidesteps this.)

Build/verify. Core compiled strict (-Wconversion -Wsign-conversion); adapters drop those two (engine headers use wide macros) but keep -Wall -Wextra -Werror. All three engines are vendored under vendor/ and built from source -- vendor/lua (Lua 5.4.6, library = src/*.c minus the lua.c/luac.c mains), vendor/mybasic (my-basic), vendor/squirrel-src (Squirrel 3.2) -- each relaxed and un-sanitized but linked into the sanitized binaries so cross-boundary heap misuse is still caught. Nothing depends on a system-installed engine or pkg-config, so the build is reproducible. The Lua platform define is selected automatically: $(OS) first (Windows sets Windows_NT and has no uname -> no define / ISO C), else from uname -s (LUA_USE_LINUX + -ldl / LUA_USE_MACOSX / LUA_USE_POSIX). NB the project is otherwise Unix-only (pthreads, sanitizers, setarch), so the Windows branch only keeps the define correct.

Function-value lifecycle across threads (sec 10), DONE. callableInvoke and the final callableRelease are now thread-correct. The core exposes two installable hooks (callableSetInvokeHook/callableSetReleaseHook, the same pattern as brokerSetRouteHook) so it stays independent of the actor layer; actorInit installs them. An invoke from a thread other than the callable's owner is marshalled to the owner's thread by reusing the CALL machinery (a callable's fn+userData are exactly a native call -- callableFn is the one new accessor). The final reference drop is routed too: a new messageReleaseE posts the finalize (fire-and-forget) to the owner, which runs the engine release (luaL_unref / sq_release) on its own thread. callableFinalize is the shared "run release + free shell" tail; the core still runs it inline when no actor is present (so the single-threaded testCallableDead semantics -- a dead callable still runs its release on last drop -- are preserved). testEngineLua captures a Lua closure on its context's thread, then invokes and releases it from the main thread; both marshal to the owner, ASan/TSan-clean. Limit: releasing a callable whose owner context has been destroyed is the deferred non-quiescent-teardown case (sec 9) -- best-effort inline finalize for now.

JavaScript adapter (Duktape), the fourth engine. Vendored Duktape 2.7.0 (the single amalgamated duktape.c/duktape.h/duk_config.h) in vendor/duktape, built relaxed/un-sanitized. src/js/jsAdapter.* mirrors the Lua/Squirrel adapters: one shared trampoline recovers its binding from a hidden property on the function object (duk_push_current_function + an internal \xFF-prefixed key) and dispatches through brokerCall; marshalling covers scalars, binary-safe strings, and the hybrid aggregate (JS array <-> list, object <-> map) with the depth cap. JS numbers are doubles, so an integral in-range number round-trips as an int (else a real). A JS function crossing out becomes a refcounted CallableT over a Duktape heap pointer kept alive by a per-heap export registry object in the global stash (the slot is dropped on release) -- and it participates in the sec-10 cross-thread routing, so a JS closure captured on its context's thread is invoked and released from another thread correctly. src/js/jsEngine.* is the EngineT binding. testJs (single-threaded: scalar/string/array/object marshalling, export

  • invoke-from-C, closure-as-arg callback, error paths) and testEngineJs (threaded: cross-context call + the sec-10 callback) -- clean under ASan/UBSan and TSan (make tsanjs). Adding the engine touched zero lines of the broker core or actor layer (one adapter TU + one engine-binding TU + Makefile rules), re-confirming the O(1)-engine-add thesis. v1 limit (as with Squirrel): pushing a foreign CallableT into JS is unsupported.

Source layout. src/ holds the project source (core + actor directly in src/, one subdir per script language: src/lua, src/mybasic, src/squirrel, src/js); tests/ holds the test programs; obj/ collects every object file (ours and the vendored engines, via patsubst into obj/); bin/ collects the binaries. The Makefile finds our sources by VPATH and groups object rules by flag set; -MMD -MP generate header dependencies automatically. make clean removes obj/ and bin/. make test builds and runs all ten binaries; make tsan/make tsansq/make tsanjs are the ThreadSanitizer variants.

16. Threading model rewrite -- host-thread natives, fire-and-forget scripts

Supersedes the earlier "natives run inline on the calling context thread" model. The host's own thread is now an implicit host context (id 0): it has a queue but no OS thread of its own, and the host drives it by calling calogPump in its loop.

  • Scripts are fire-and-forget. calogContextEval(ctx, src) enqueues the script onto the context's thread and returns a status (not the result); the script runs asynchronously, and results come back by calling natives.
  • A registered native runs on the host thread, serialized. A script calling one posts a CALL onto the host queue and parks; the host runs it during calogPump. So host C code is never called concurrently and needs no locking. actorRoute inlines a call already on the host thread; otherwise it marshals to the host context (id 0).
  • calogRegisterInline is the escape hatch: the registry entry's runInline flag (which replaced ownerCtxId) makes the native run on the calling script's thread.
  • Errors from a fire-and-forget script are posted to the host queue and delivered to the CalogErrorFnT handler (calogSetErrorHandler) during calogPump (default: log to stderr).
  • Function values (CalogFnT) still run on their owning engine's thread -- sec 10 routing is unchanged, and calogFnInvoke from the host blocks-and-pumps the host queue while it waits (the same nested pump, now applied to the host context).
  • Nested eval is allowed: a new eval that arrives while a context is mid-script (parked on a native call) runs nested via pumpUntil -- consistent with the sec-6.1 re-entrancy contract (interpreters support nested pcall/peval).

API shape: calogRegister(c,name,fn,ud) / calogRegisterInline(...); calogContextOpen(c,engine) -> CalogContextT* (create+start merged, since nothing is registered between them anymore) and calogContextClose; calogContextEval(ctx,src) fire-and-forget; calogPump; calogSetErrorHandler. CalogConfigT and the createInterpreter config parameter are gone -- a context now exposes every registered native (the engine binding walks the registry via the internal calogForEach). Tests rewritten to drive calog the host way (register, open, eval, pump-until-a-native-records-the-result); testActor is now engine-free, exercising the dispatch machinery with C callables (calogFnCreate) on synthetic contexts. Verified: make test 441 checks across 11 binaries (incl. examples/embed.c), gcc + clang strict, ASan/UBSan + TSan clean (make tsan/tsansq/tsanjs).

CalogT owns its contexts; ids are unbounded. The active-context registry moved from context.c file-static globals into struct CalogT (now defined in calogInternal.h): a runtime owns both its native-function registry and its active-context registry (ctxMutex, ctxSlots, freelist). context.c reaches it via one runtime pointer set in calogActorInit (which also refuses a second runtime). So calogDestroy closes every still-open context automatically -- the host need not track them (a test opens 32 and never closes them; ASan confirms no leak). Context ids widened to uint64 (32-bit slot index + 32-bit generation), so neither the live count nor open/close churn hits a preset ceiling; calogContextId/calogCurrentId return uint64_t. The now-dead ownerGen parameter was dropped from calogFnCreate (generation lives in the packed id). Re-verified: make test 473 checks, ASan no leaks, TSan clean, gcc + clang strict.

Independent runtimes in one process. The one-runtime limit was not fundamental -- just process-global state that hadn't moved into CalogT. All of it now has: the host context, the routing hooks (routeHook/invokeHook/releaseHook), and the error sink are CalogT fields; a CalogFnT carries a runtime pointer so calogFnInvoke/release reach the right hooks (the callable path has no CalogT otherwise). The dispatch reaches its runtime through the object it already holds -- the route hook is handed its calog, the callable hooks read calogFnRuntime, context-thread code uses context->broker, and the rest take an explicit calog argument. The only remaining process global is currentContext, and it is thread-local (it names the calling thread's context). So the setters (calogSetRouteHook etc.) are gone -- calogActorInit assigns the fields directly -- and the runtime static that the earlier review flagged is deleted. Runtimes are isolated: don't pass a value or callable between them (a cross-runtime reply cannot route). A test spawns N threads, each creating, driving, and destroying its own runtime concurrently.

One thread may host several runtimes. calogPump(calog) sets currentContext to calog's host context for the drain and restores it after, so a single thread can drive many runtimes by pumping each in turn -- a native serviced during calogPump(A) sees calogCurrent() == A even if the thread also hosts B. Two consequences fall out and are handled: context ids number from 1 in every runtime, so the "already on the owner's thread" (inline) and "caller can take the token/pump path" (reply) decisions match the runtime too, not the id alone -- a foreign or wrong-runtime caller takes a reply box, which cannot misroute. A test creates two runtimes on one thread, runs a script in each, and pumps both in a loop, asserting each runtime's native resolved calogCurrent() to its own runtime (it fails if the pump doesn't rebind currentContext). Re-verified: make test 480 checks, ASan no leaks, TSan clean (both concurrent runtimes and one thread pumping two), gcc + clang strict.

Loading a script by filename. Each engine carries a NULL-terminated extensions list ({"lua"} / {"js"} / {"nut"} / {"bas"}), and a host makes engines available for filename-based loading with calogRegisterEngine(calog, &engine). calogContextLoad(calog, base) then walks the registered engines in registration order (each engine's extensions in order), forms "<base>.<ext>", and the first one that fopens wins: it reads the file on the calling thread, opens a context on that engine, and loads the contents fire-and-forget -- returning the context (NULL if nothing matched or the load failed). Registration matters for more than search order: hardcoding the built-in engine vtables in the core would force-link all of them (and their vendored runtimes) into every binary, defeating the per-engine archives -- so the host opts in, and a binary that never references an engine pulls in none (testActor stays engine-free). Engine selection is fundamentally a build-time (link) choice, so calogRegisterBuiltinEngines (a header-inline in calog.h) registers exactly the engines whose CALOG_WITH_<ENGINE> macro is set -- the host defines those alongside the archives it links, and the inline emits nothing (references no engine) unless called, so it never force-links.

my-basic as an actor engine. Making my-basic loadable meant running it under the actor model for the first time, which exposed two things. (1) Its native dispatch called the C function directly instead of through calogCall, so natives ran on the my-basic context thread rather than marshalling to the host -- fixed by routing mbDispatch through calogCall (the binding now stores the registry name), matching the other engines. (2) my-basic keeps process-global state -- lazy mb_init singletons and a global _mb_allocated counter touched on every allocation (forced on in the vendored header) -- so two my-basic contexts on different threads race (TSan-confirmed). The singletons are built once by mb_init and read-only thereafter, so the only execution-time shared write is that counter; a one-line vendored patch makes it _Atomic (the original is preserved as vendor/mybasic/myBasic.c.orig). With the counter safe, the my-basic engine (not the adapter, which stays usable single-threaded and lock-free) needs a lock only across lifecycle -- mb_init's first-context build, mb_dispose's last-context teardown, and the shared context refcount -- and NOT across runSource, so several my-basic scripts execute concurrently. A tsanmb target proves the parallel case is race-free (verified further by a 4-context stress running arithmetic, strings, lists, and booleans). Verified: make test 494 checks (13 binaries), ASan no leaks, TSan clean on all four engines (tsan/tsansq/tsanjs/tsanmb), gcc + clang strict.

15. Public embedding API (calog.h) -- as-built (superseded by sec 16 for threading/API)

calog is packaged as an embedding library: a host links it, registers its own native C functions, creates script contexts on an engine, and runs scripts. Every public symbol carries a calog prefix (types Calog...T, enums Calog...E) so the library is a good citizen in a host binary. The API was curated to the minimum:

  • One handle, one header. CalogT is the runtime; calogCreate() composes the registry with the actor layer (installs the routing hooks) and calogDestroy() tears both down -- no separate init/shutdown for the host. The entire embedding surface is src/calog.h (~30 functions); internal machinery (the registry entry type, the route/invoke/release hooks, the low-level callable lifecycle, the split calogBrokerCreate/calogActorInit) lives in src/calogInternal.h, which host code never includes. calog.h leaks no internal symbol.
  • One config type. The three per-engine configs collapsed into CalogConfigT (exposeNames + exposeCount), used by every built-in engine vtable.
  • Value model unchanged, just prefixed. CalogValueT/CalogAggT/CalogFnT + constructors (calogValueInt, ...), ops (calogValueCopy/Free/Move/Equals), aggregates (calogAgg*), function values (calogFnInvoke/Retain/Release), and calogFail/calogTypeName for writing natives. Contexts: calogContextCreate/ Start/Eval/Destroy/Id, plus calogCurrentId/calogCurrent for natives. The built-in engine vtables are calogLuaEngine, calogJsEngine, calogSquirrelEngine (a host may also supply a custom CalogEngineT).
  • Packaging. make builds lib/libcalog.a (calog itself: core + actor + every adapter/binding) and separate vendored-engine archives (liblua.a, libduktape.a, libsquirrel.a, libmybasic.a). A host links libcalog.a plus whichever engine archives it uses; unused adapters (and their engine deps) stay unlinked because static members are pulled only when referenced -- so a JS-only host never links Lua/Squirrel. The tests consume the archives; the threaded/engine tests use only calog.h, validating that the public surface is complete. examples/embed.c is a ~30-line host (public header only) that registers a native and calls it from JavaScript.
  • Reconfirmed: rename + restructure kept all 441 checks passing across 10 test binaries, clean under ASan/UBSan and TSan (all four engines), gcc + clang strict.

14. Implementation notes (as-built: actor layer, engine-on-a-thread, Squirrel)

The actor layer (context.h/context.c, build step 4) is built and tested: testActor exercises cross-context routing, the always-live nested pump (the re-entrant A->B->A deadlock test), and a concurrent fan-out stress; clean under ASan+UBSan and ThreadSanitizer (make tsan, run under setarch -R -- some kernels hand out more mmap randomization than TSan's shadow tolerates). One thread + one MPSC queue per ScriptContextT; brokerCall routes through an installed hook (brokerSetRouteHook) so owner-0/same-context calls run inline and others marshal to the owning thread; an external caller blocks on a private reply box, a context caller pumps. The reply carries only {status, result} -- the single error channel (sec 8) is structural, the error string rides in result. contextSendBlocking and contextReply are the shared enqueue-wait and reply tails behind both CALL and EVAL dispatch.

Generationed registry (sec 9), DONE. Context ids pack a 16-bit slot index and a 16-bit generation; the registry is a slot table plus a freelist. contextDestroy unlinks a context under the registry lock (after stopping+joining its thread) and returns the slot to the freelist; the next reuse bumps the generation. A stale id (slot since freed/recycled) resolves to brokerErrDeadE, never misroutes to the recycler -- testActor's generation test proves it. The registry lock is held across enqueue, so a foreign enqueue cannot race a destroy onto a freed queue mutex. Still quiescence-assuming (no call to the context in flight at teardown); in-flight reference draining is the remaining sec 9 hardening.

Engine on a thread (the EngineT vtable). EngineT gained runSource; contextEval(context, source, result) marshals a script run onto the context's own thread (a new messageEvalE) and blocks like a call. Each adapter's engine binding lives in its own TU (luaEngine.*, squirrelEngine.*) -- the only Lua/Squirrel files that depend on the threading layer, keeping the adapters thread-agnostic so testLua/testPolyglot link them without context.o/pthread. createInterpreter runs on the thread and exposes the configured natives there. Crucially, the exposed- native trampolines now dispatch through brokerCall (by broker+name) instead of a captured fn pointer, so an exposed native owned by another context is transparently routed to its thread -- the script author still writes doubleIt(21). With no route hook installed this is identical to the old inline path (testLua still passes). testEngineLua proves a real Lua interpreter on a context thread calling a thread- agnostic native and a cross-context native, on the correct threads.

Squirrel adapter (sec 11 step 6), the O(1)-engine-add validation. Vendored Squirrel 3.2 in vendor/squirrel-src (C++), built relaxed/un-sanitized with -D_SQ64 -DSQUSEDOUBLE so SQInteger/SQFloat are 64-bit int / double matching ValueT -- the adapter shares those defines so the ABI matches. squirrelAdapter.* mirrors the Lua adapter: one shared trampoline recovers its binding from the closure's single free variable (which the VM pushes onto the stack after the args, so it sits at the top -- verified in sqvm.cpp CallNative), marshals scalars, binary-safe strings, and the hybrid aggregate (array<->list, table<->map) with the shared depth cap, and dispatches through brokerCall. testEngineSquirrel runs a real VM on a thread doing the cross-context call plus string and array round-trips; clean under ASan+UBSan and TSan (make tsansq). The total surface a new engine added: one adapter TU + one engine-binding TU + Makefile rules -- no change to the broker core or the actor layer, which is the thesis.

Squirrel closure export, DONE. A Squirrel closure crossing the boundary now becomes a refcounted CallableT over a pinned HSQOBJECT (sq_addref/sq_release, mirroring Lua's luaL_ref lifecycle): squirrelExport fetches a named global closure, and a closure passed as a native argument is exported the same way during ingress (the VM's foreign pointer -- finally used -- recovers the owning context). squirrelCallableInvoke runs sq_pushobject+sq_call on the owner's VM and marshals the return; squirrelCallableRelease sq_releases on the owner thread. Single-threaded testSquirrel covers export+invoke-from-C, a closure passed as an argument and called back through the broker, and the not-found/type-error paths; ASan-clean (no addref/release leak). Caveat (same as Lua): release exported callables before squirrelContextDestroy. Remaining limit: the reverse direction (a foreign CallableT pushed INTO Squirrel so a script can call it) returns brokerErrUnsupportedE -- Squirrel has no clean callable-userdata-with-__gc like Lua, so it needs a class instance with a _call metamethod + release hook.

make test runs all seven binaries (411 checks). make tsan covers the actor core and the Lua engine path; make tsansq the Squirrel path.

Adversarial review (3 parallel reviewers: actor concurrency, Squirrel adapter, Lua trampoline + engine bindings). Two real defects found and fixed:

  • NULL-interpreter crash. threadMain ignores createInterpreter's status, so a failed create (e.g. a config expose-name that was never registered) left a context serving with interp == NULL; contextDispatchEval only checked runSource != NULL, so the first eval called runSource(NULL,...) -> NULL deref. Fixed by guarding interp == NULL (the context still serves native calls, just rejects evals); regression test in testEngineLua (testFailedInterpreter).
  • OOM lost-wakeup. contextReply's context-caller branch allocated a fresh REPLY and, on calloc failure, dropped the wakeup -- the caller hung in pumpUntil forever. Fixed by reusing the request message as the reply (it already carries the token and replyToId), which removes the allocation entirely, so the wakeup can no longer be lost to OOM. The Squirrel adapter was traced clean against the real Squirrel source (trampoline free-var indexing, stack balance, ValueT ownership, the throwerror/free order, binary strings); added sq_reservestack guards before the recursive marshallers to match the Lua adapter's lua_checkstack discipline. Documented (not changed): the registry must be frozen before contexts start (brokerCall reads it locklessly from context threads -- noted in broker.h); teardown still assumes quiescence (sec 9); and the hybrid-aggregate-to-Squirrel-table egress flattens array indices and integer keys into one table (same lossy edge as elsewhere in the fidelity table).

17. Engine expansion -- QuickJS-ng, and three new languages (Berry, s7, Wren)

calog now ships seven engines. Each is one adapter TU (marshalling + native trampoline + the sec-10 callable export) plus one engine-binding TU (the four-hook CalogEngineT), a vendored-from-source archive, a testEngine*, and a tsan* target -- the core, the actor layer, and calog.h were untouched, re-confirming the O(1)-per-engine thesis. Which engines a binary pulls in is a link-time choice: the header-inline calogRegisterBuiltinEngines references only the CALOG_WITH_<ENGINE>-selected vtables, so testActor still links zero engine code.

QuickJS-ng replaces Duktape (same calogJsEngine / .js, same jsAdapter.h / jsEngine.c -- only jsAdapter.c and the Makefile changed). The wins: a JS BigInt round-trips to int64 exactly (JS_ToBigInt64), closing the double-only fidelity gap Duktape had (proven by a 2^53+1 test); JS functions are refcounted JSValues (JS_DupValue / JS_FreeValue), replacing the Duktape heap-pointer-pinning registry behind CalogFnT; and the broker/name binding rides on JS_SetContextOpaque + JS_NewCFunctionData. One gotcha: a missing global reads back as undefined (not an error), so calogJsExport maps JS_IsUndefined to not-found while a bound non-function stays a type error. Core library = quickjs.c + libregexp.c + libunicode.c + dtoa.c, built with -D_GNU_SOURCE.

Berry (.be) is a Lua-like stack VM (64-bit ints, binary-safe be_pushnstring). Natives are Berry native closures carrying two upvalues (the context comptr and the name), recovered with be_getupval(vm, 0, pos); a Berry function crossing out is pinned under a uniquely-named hidden global (globals are GC roots) and dropped by setting it to nil. The sharp edge: be_pcall(vm, argc) leaves the result in the function's slot (base+1), not at -1 (which holds the last stale argument). Vendoring needs Berry's coc codegen prebuild plus its OS port (be_port.c) and module/class tables (be_modtab.c). Aggregate marshalling is a v1 limit (scalars + strings + callbacks are complete).

s7 Scheme (.scm) uses the current official s7 (an older mirror lacked s7_free, which would leak a heap per context). Since s7 native functions carry no user data, all natives route through one generic %calog-call dispatcher plus a per-name Scheme wrapper ((define (report . a) (apply %calog-call "report" a))); the context rides on a *calog-context* c-pointer global. Callables are kept alive by s7_gc_protect and invoked with s7_call. Because s7_eval_c_string evaluates a single form, calogS7Run wraps the (escaped) source in (catch #t (lambda () (eval-string …)) handler), so both read and run errors surface as a value -- a marker pair the runner detects. int64 and Scheme lists round-trip both ways; the keyed part (map) is a v1 limit. s7's intentional "permanent string" interning (which s7_free does not reclaim) is a small, bounded allocation, suppressed with a documented, allocation-site-specific __lsan hook. s7 is per-interpreter thread-safe -- no serialization needed (unlike my-basic).

Wren (.wren) is the outlier: Wren has no bare function calls, so every native is reached through a single foreign method. A preamble defines class Calog { foreign static call_(name, args) }, and scripts call Calog.call_("report", [42]); the C dispatcher recovers the context from wrenGetUserData, marshals the argument list, and dispatches through calogCall. A Wren function crossing out is a retained WrenHandle, invoked with a cached per-arity call(_) handle. Wren numbers are IEEE doubles, so int64 above 2^53 loses precision (the same edge my-basic and old-JS have). Lists round-trip; map read is a v1 limit. Wren keeps no process-global state, so contexts run in parallel.

Verified across all seven engines: make test (525 checks), ASan/UBSan clean, a tsan<engine> target clean for each, and gcc + clang strict on the core.