53 KiB
calog -- Polyglot Script Broker: Design & Implementation Plan
A C "broker" that lets one application be written in a mix of scripting languages (Lua and my-basic first; Squirrel and others later). Native C functions are added once and become callable from every language. Functions and data exported from one module are callable from modules written in another language. Threading is actor-based; networking rides the same dispatcher. Data sharing is by-value for v1.
This document is the reconciled output of a design pass plus an adversarial verification pass. Where the verification corrected the first-cut design, the correction is folded in and noted as "[verified]".
1. Architecture: hub and spoke
Nothing talks to anything else directly. Every engine talks only to the broker, through two shared contracts:
- One universal value type,
ValueT(a tagged union). - One uniform native-function signature:
typedef int32_t (*NativeFnT)(ValueT *args, int32_t argCount, ValueT *result, void *userData);
A developer writes a native function once against that signature and registers it once.
A script function exported from a module is itself stored as a NativeFnT whose body
re-enters its owning interpreter -- so "call C from script", "call script from C", and
"call module A's function from module B" are all the same code path. Adding an engine is
O(1) adapter work, not O(N) per existing engine.
The single most important lesson from the verification pass: there must be exactly one
ValueT / AggregateT / ValueTypeE, defined in broker.h, included verbatim by every
adapter. The first-cut design had three divergent copies and would not have linked, let
alone round-tripped data. Section 2 is therefore the load-bearing part of this plan.
2. The canonical type system (broker.h) -- single source of truth
2.1 Value tags and the value struct
typedef enum ValueTypeE {
valueNilE = 0,
valueBoolE = 1,
valueIntE = 2, // int64_t
valueRealE = 3, // double
valueStringE = 4, // length-prefixed, binary-safe
valueAggregateE = 5, // hybrid array + map container
valueFnE = 6 // function value: refcounted handle to a CallableT
} ValueTypeE;
typedef struct StringT {
char *bytes; // owned; always NUL-terminated at bytes[length] for C consumers
int64_t length; // byte count excluding the convenience terminator (binary-safe)
} StringT;
typedef struct ValueT {
ValueTypeE type;
union {
bool b;
int64_t i;
double r;
StringT s;
struct AggregateT *agg; // heap-owned subtree
struct CallableT *callable; // refcounted, broker-owned
} as;
} ValueT;
2.2 The aggregate (one shape both engines map onto)
typedef struct PairT {
ValueT key; // marshal layer constrains to int/real/string keys (see 2.5)
ValueT value;
} PairT;
typedef enum AggregateKindE {
aggregateListE = 0, // empty container round-trips as a list by default
aggregateMapE = 1,
aggregateBothE = 2 // array part AND pairs part populated
} AggregateKindE;
typedef struct AggregateT {
AggregateKindE kind; // disambiguates empty/mixed containers across engines
ValueT *array; // dense elements [0, arrayCount)
int64_t arrayCount;
int64_t arrayCap;
PairT *pairs; // map part; preserves insertion order
int64_t pairCount;
int64_t pairCap;
} AggregateT;
A Lua table's sequence part maps to array, its remaining keys to pairs. A my-basic
LIST maps to array, a DICT maps to pairs. The explicit kind flag fixes two
problems the verifier flagged: an empty {} had no defined type on the far side, and a
mixed array+hash Lua table had no representation at all.
2.3 Function values: CallableT
valueFnE is the one deliberate exception to "by-value everything". A function cannot be
meaningfully copied between heaps, so it is shared by reference -- but safely, because it
is only ever invoked, never inspected, and invocation always routes to the owning
context's thread.
typedef struct CallableT {
NativeFnT fn; // uniform invoke entry
void *userData; // luaL_ref slot, pinned mb_value_t, or C closure ctx
uint32_t ownerCtxId; // context whose thread MUST run it
uint32_t ownerGen; // generation of that context (UAF guard, see sec 9)
int32_t refCount; // ATOMIC; shared-handle lifetime across threads
bool alive; // false once the owning context is torn down
} CallableT;
Rules (these resolve the verifier's critical "function value breaks the no-shared-pointer invariant" finding):
valueCopyof avalueFnEdoes an atomic refcount increment on the sameCallableT-- it does NOT clone the closure. The sharedCallableT*across threads is allowed precisely becauserefCountis atomic and the closure is touched only on its owner thread.valueFreeof avalueFnEdoes an atomic decrement. When it hits zero, the underlying closure must be released with an interpreter call (luaL_unref, my-basic unref) -- which can only run on the owner's thread. So a zero-drop on a foreign thread posts a release message to the owner context rather than calling the interpreter directly (sec 10).- Invoking a dead handle (
alive == false, owner gone) returns a clean broker error, never a call into a freed interpreter.
2.4 Value operation contract (one signature, used everywhere)
The verifier caught two designs declaring valueCopy with incompatible signatures. The
canonical form -- status-returning, dst-by-pointer, so OOM is checkable on the hot path:
int32_t valueCopy(ValueT *dst, const ValueT *src); // deep copy; brokerOkE / brokerErrOomE
void valueFree(ValueT *v); // recursive free; leaves a safe nil
void valueMove(ValueT *dst, ValueT *src); // zero-alloc ownership transfer
valueFree and valueCopy MUST have a default:/explicit case for every tag including
valueFnE -- a missing case was how the first cut silently leaked function payloads.
2.5 Cross-engine fidelity table (the honest limits)
By-value marshalling across Lua <-> broker <-> my-basic is lossless for the common cases
and lossy at documented edges. These are inherent to my-basic's value model (verified
against my_basic.h/.c), not marshalling bugs:
| Aspect | Lua | my-basic | Crossing Lua <-> BASIC |
|---|---|---|---|
| integers | 64-bit | int_t == int (32-bit, always) |
truncates above 2^31 -- range-check + error |
| reals | double | float (double w/ -DMB_DOUBLE_FLOAT) |
precision loss unless double build |
| int vs real subtype | distinct (5.4) | integral real auto-collapses to int | subtype not preserved through BASIC |
| strings | binary-safe (length) | bare char* (strlen) |
embedded NUL truncates -- detect + error |
| array / list | table sequence part | LIST |
OK |
| map / dict | table hash part | DICT (int/real/string keys) |
OK |
| mixed array+hash table | one table | no single LIST+DICT value | collapses to DICT (array part -> int keys), documented |
empty {} |
table | LIST or DICT? | kind flag; default LIST |
| function value | closure (luaL_ref) |
lambda/routine (MB_DT_ROUTINE) |
OK via CallableT (by-reference) |
| nested depth | bounded | bounded | one shared cap; defined error past it |
| cycles | rejected on ingress | rejected on ingress | impossible at rest (by-value, no shared refs) |
Policy decisions baked in to make the above deterministic:
- One shared recursion-depth cap applied on every recursive path -- both ingress and egress, all components -- failing with a defined status instead of overflowing the C stack. (The first cut bounded only Lua ingress.)
- One strict/lenient switch, owned by the broker and honored by both adapters: in strict mode an unrepresentable value/key (e.g. a function used as a table key, a 64-bit int into BASIC, a NUL-bearing string into BASIC) is an error; in lenient mode it is dropped/coerced with a documented rule. Never a silent surprise either way.
- Keys: broker allows int/real/string keys (my-basic dicts accept all three). Float keys get defined equality; non-representable keys follow the strict/lenient switch.
3. The engine vtable -- what makes the actor loop engine-agnostic
Each adapter implements one small vtable so the broker/actor core never special-cases an engine:
typedef struct EngineT {
const char *name;
void *(*createInterpreter)(struct ScriptContextT *ctx); // ON the owning thread
void (*destroyInterpreter)(void *interp);
int32_t (*loadSource)(void *interp, const char *src, int64_t len);
int32_t (*registerNative)(void *interp, const char *name, NativeFnT fn, void *userData);
int32_t (*callExport)(void *interp, void *exportRef, ValueT *args, int32_t argCount, ValueT *result);
void (*releaseExport)(void *interp, void *exportRef); // ON the owning thread
} EngineT;
createInterpreter, destroyInterpreter, releaseExport, and every callExport run on
the context's own thread -- that is the invariant that keeps each interpreter
single-threaded.
4. The Lua adapter
Target Lua 5.4 (note 5.1/5.3 deltas where they matter). The C API surface was verified accurate; the fixes below are about lifecycle, not API names.
4.1 Native registration and the trampoline
Each registered NativeFnT becomes a Lua C closure: push the NativeFnT and its
userData as upvalues with lua_pushcclosure, then lua_setglobal (or into a module
table). The single trampoline recovers them from upvalues, marshals the Lua stack into a
ValueT[], calls the NativeFnT, and pushes the result back.
lua_setglobalreturnsvoid(the first cut documentedint-- harmless but wrong).- Lua allocation APIs (
lua_newuserdatauv,lua_createtable, ...) longjmp on OOM and never return NULL -- so NULL-checks after them are dead code; the OOM path is a Lua error, not a C return. OnlyluaL_newstatecan return NULL and must be checked.
4.2 Marshalling ValueT <-> Lua (by value)
Scalars are direct. Strings use lua_tolstring + length (binary-safe, preserves NULs).
Tables deep-copy in both directions:
- Ingress (table ->
AggregateT): normalize the table index to absolute beforelua_next; route numeric keys throughlua_tointeger/lua_tonumber(do not letlua_tolstringmutate a numeric key in place); fillarrayfor the sequence part andpairsfor the rest; detect cycles via an ancestor-pointer stack (lua_topointer); uselua_rawset/lua_rawgetto avoid metamethods; enforce the shared depth cap. - Egress (
AggregateT-> table): build withlua_createtable, populatearrayas the sequence andpairsas keyed entries, with the same depth cap (the first cut had no egress cap -- a deep BASIC-origin structure could overflow the stack on the way back). Make the builder self-balancing: recordlua_gettopon entry andlua_settopback on any error return.
4.3 Exporting a Lua function (and the leak fix)
A Lua function crossing the boundary becomes a valueFnE: pin the closure with
luaL_ref(L, LUA_REGISTRYINDEX), wrap it in a CallableT. The fn body does
lua_rawgeti to retrieve, marshals args onto the stack, lua_pcall, marshals the return;
on error it pulls the message with luaL_tolstring and reports it as a broker error
(sec 8).
Two bugs the verifier found, fixed here:
- The exports array needs grow-on-demand. The context is
calloc'd, so the array starts NULL/0; the first export mustrealloc(double the cap, handle the NULL/0 seed) before storing, andluaL_unrefthe just-created ref if the realloc fails. - Transient vs persistent ownership. Every Lua function passed as a native argument
was creating a
luaL_refthat only got released atlua_close-- an unbounded leak for any long-lived context that takes callbacks. Fix: theCallableTrefcount owns theluaL_ref. When the lastvalueFreedrops the handle to zero, the ref is released (on the Lua thread per sec 10). A function the broker retains as a real export holds a reference for as long as it is registered; a function merely borrowed for the duration of one call is released when that call'sValueTargs are freed. Same mechanism, two lifetimes.
4.4 Calling a function value from Lua
A valueFnE marshalled into Lua becomes a Lua C closure over the CallableT*, so script
authors just write cb(x, y) and it transparently dispatches (to the owner's thread if the
function lives elsewhere). A universal call(fn, ...) native is also provided for
uniformity across engines.
4.5 Context lifecycle
luaL_newstate + luaL_openlibs on the owning thread; confine the lua_State to that
thread forever; teardown luaL_unrefs outstanding refs then lua_close.
5. The my-basic adapter
Verified against paladin-t/my_basic (my_basic.h + .c read directly). Zero
hallucinated calls. The interesting work is three my-basic-specific quirks that shape the
adapter; all three are forced by the source, not stylistic.
5.1 Lifecycle and the inverted register result
mb_init (once per process) / mb_open(&bas) / mb_load_string(bas, src, true) /
mb_run / mb_close / mb_dispose. The broker pointer is threaded through the interp's
single userdata slot via mb_set_userdata / mb_get_userdata.
mb_register_funcreturns a count, not a status -- nonzero means registered,0means duplicate/failure. That is the opposite of theMB_FUNC_OK == 0convention, so the success test must be inverted. Names are uppercased internally (mb_strupr), so the broker key is the uppercased identifier (BASIC is case-insensitive).
5.2 The native-function protocol and the trampoline bank
The native signature is typedef int (*mb_func_t)(struct mb_interpreter_t*, void**); --
no per-callback userData parameter, and the interpreter has only one userdata slot.
So a single shared C trampoline cannot tell which broker function it is serving.
Fix (the verifier confirmed this limitation is real): a macro-generated bank of
trampolines mbTramp0 .. mbTrampN, each hardcoding its slot index, each looking up
ctx->nativeBank[slot] (the NativeFnT + userData) via the interpreter's userdata
pointer. The bank size caps how many natives one my-basic context can host; size it
generously and document it.
Inside a trampoline the argument protocol is the real my-basic frame dance:
mb_attempt_open_bracket / loop mb_pop_value (honoring mb_has_arg) /
mb_attempt_close_bracket / compute / mb_push_value (or the typed mb_push_*).
5.3 String ownership (memdup is mandatory)
mb_pop_string hands back a borrowed interior pointer -- the broker must strdup/copy
it immediately. Pushed strings are taken over by the interpreter and later freed with its
allocator, so any string handed to mb_push_value/mb_make_string must come from
mb_memdup (not plain malloc). Embedded NULs cannot survive (bare char* + strlen)
-- enforce the strict/lenient policy on egress.
5.4 Aggregates: the collection API
There is no mb_make_coll. A list/dict is built by presetting coll->type = MB_DT_LIST/MB_DT_DICT then calling mb_init_coll, and accessed with
mb_get_coll / mb_set_coll / mb_remove_coll / mb_count_coll / mb_keys_of_coll.
Collection support is on by default (MB_ENABLE_COLLECTION_LIB). A broker aggregate with
both array and pairs populated collapses to a DICT (array part becomes integer keys)
per the fidelity table.
5.5 Exporting a BASIC routine -- the parked __BROKERSERVE frame
This is the my-basic-specific crux. To call a BASIC routine/lambda from C you use
mb_get_routine(s, l, name, &val) then mb_eval_routine(s, l, val, args, argc, &ret) --
and mb_eval_routine dereferences *l and hard-requires a live, non-NULL void** l
(verified at my_basic.c:14344/14358). A valid l only exists inside a running native
call. Therefore a my-basic context cannot be driven from arbitrary C; it must be parked
inside a native frame.
Design: register a native __BROKERSERVE whose C body is the context's message-pump /
serve loop. A module hands control to the broker by ending with a SERVE call (the adapter
appends one if absent). While parked there, the loop holds a valid l, which it uses to
mb_eval_routine whenever another context calls one of this module's exported routines.
mb_get_routine returns MB_FUNC_OK with a nil value when a name is absent, so the
not-found test is routine.type != MB_DT_ROUTINE, not the status code.
5.6 Numeric and identity caveats
int_t is 32-bit unconditionally (64-bit broker ints truncate -- range-check + error or
promote to real with documented precision loss); integral reals auto-collapse to int so
real/int subtype is not preserved across a BASIC hop. Both are in the fidelity table; both
follow the strict/lenient switch.
6. Threading: the actor model
Each ScriptContextT owns one interpreter, one OS thread (pthreads -- chosen over C11
<threads.h> for portability/maturity), and one inbound MPSC message queue. Interpreters
are single-threaded; only the owning thread ever enters callExport. A cross-context call
is a message; the caller blocks for the reply on a per-call condvar future (lost-wakeup
safe via a predicate loop). The verifier confirmed the core is sound: no path lets two
threads touch one interpreter, the deep-copy ownership ledger is correct on the success
path, and the epoll thread enqueuing while a context is mid-dispatch is race-free.
The fixes folded in from verification:
- One error channel. The first cut carried a separate error
ValueTin the reply that the caller never freed -- a leak on every errored call, lost error text, and a second source of truth contradicting the broker's "error travels inresult" contract. Fix: on failure the adapter writes the error string intoresult(asbrokerSetErrordoes); the reply carries only{status, result}. One channel, one owner, freed once. valueCopychecked on enqueue. Use the canonicalint32_t valueCopy(dst, src), check each arg, and unwind partially-copied args on OOM (mirroring the broker route path). The actor layer should call the broker's marshalling, not reimplement a copy loop.- Shutdown drains everything. On
SHUTDOWN, error-reply every queuedCALLand free every queued/stashedREPLY(result + error) before join -- the first cut leaked in-flight replies unwound by a nested shutdown. - Explicit thread stack size. The reentrancy depth bound counts dispatch nesting, not
C-stack bytes; set a validated stack size with
pthread_attr_setstacksize(or lower the bound) so the "clean catchable depth error" promise actually holds instead of a UB overflow. - Split the ready-handshake condvar off the queue condvar so
queueCondhas exactly one semantic (latent lost-wakeup footgun if a second waiter is ever added).
6.1 What a context does while blocked: always-live nested pump [DECIDED]
When context A makes a synchronous cross-context call and waits for the reply, A's thread pumps its own inbound queue instead of sleeping idle. An incoming call to A -- including a re-entrant B->A issued during the very call A is waiting on -- is serviced on A's own thread, then A resumes waiting. This was chosen over strict run-to-completion because it never deadlocks and needs no wait-for-graph deadlock detector; the rejected alternative would have had to raise a "synchronous call cycle" error on A->B->A and would leave a busy context unresponsive to other callers. The verifier validated the pump as sound: only A's thread ever enters A's interpreter (the single-threaded invariant holds), reply nesting is strict LIFO, and depth is bounded.
The contract this commits the runtime and script authors to:
- Re-entry happens only at explicit cross-context call points (
x = getUserInfo(),data = sockRecv(c)), never mid-statement -- a call point is a yield point. - Module-global state may differ after a cross-context call returns, because another call may have run on this context while it was outstanding (the same contract as any RPC). Local variables are unaffected.
- Reentrancy is depth-bounded with a catchable error (backed by an explicit pthread stack size, sec 6 fixes), so runaway ping-pong fails cleanly instead of overflowing.
Script code stays plain synchronous-blocking regardless -- info = getUserInfo() just
works; this only governs what the runtime does while a call is outstanding.
7. Networking and the dispatcher
One dedicated I/O thread runs epoll (Linux; kqueue/poll for portability) and owns no
interpreter. Async socket primitives (sockConnect, sockListen, sockSend, sockRecv,
sockClose, plus a timer) are registered once through the broker, so every language gets
them. The recommended v1 model is synchronous-blocking at the script level: data = sockRecv(conn) parks the calling context on a reply future; when epoll reports readiness,
the I/O thread builds a CALL/reply and enqueues it onto the owning context's queue, so
the result lands on the right interpreter thread. Callbacks are opt-in on top: pass a
valueFnE (e.g. onConnect(myFunc)) and the completion invokes it via the same dispatch,
always back on its home thread. No separate async keyword, no per-engine coroutine support
needed.
Fixes from verification:
- The I/O command queue must be strict tail-append FIFO (a
sockSendissued right aftersockConnectmust be processed after the connect that registered the handle); assert it. - The resolver/connect path must deep-copy
hostbefore the command is freed (the first cut had an unconditionalfree(cmd->host)that would UAF if stored by pointer). - Wake the epoll thread for new interest via
eventfd/self-pipe; deregister on close; draineventfdtoEAGAIN. - Portability note:
pthread_condattr_setclock(CLOCK_MONOTONIC)is absent on Darwin -- guard it with#ifdefand derive any monotonic timed wait accordingly (a monotonic deadline cannot be handed to a realtime-clock condvar).
8. Error model (one source of truth)
A NativeFnT returns a status int and, on failure, writes a human-readable message into
result (a valueStringE tagged as an error, or a small error-struct convention). That
single in-band channel crosses the actor boundary unchanged, is freed exactly once by the
caller, and is surfaced into the calling engine as that engine's native error
(luaL_error/lua_error for Lua, mb_raise_error for my-basic). There is no second error
field anywhere.
9. Context lifetime and the registry (UAF fix)
The critical use-after-free: contexts were addressed by raw ScriptContextT* (and exports
held raw owner pointers), while contextShutdown frees the context and destroys its
mutex/cond at runtime -- so a foreign thread could enqueue onto a freed queueMutex.
Fix:
- Address contexts by a stable integer id through a locked registry; never by raw
pointer.
contextEnqueue/contextCall/ioDispatchresolve id -> context under the registry lock and either hold the lock across enqueue or take a reference so the context (and its mutex) cannot be freed mid-enqueue. - Add a generation counter to context ids and to
CallableT.ownerGenso a recycled id cannot misroute an in-flight completion to a different context. contextShutdown: under the lock, mark dead and remove from the id map; reject new enqueues with a defined "dead context" error; drain and error-reply queued work; wait for in-flight references to drain; then free.
10. Function-value lifecycle across threads
CallableT.refCount is atomic. valueCopy bumps it; valueFree drops it. The subtlety:
releasing the underlying closure is an interpreter op (luaL_unref / my-basic unref) that
must run on the owner's thread. So when a drop reaches zero on a foreign thread, the
broker posts a release message to the owner context instead of touching the interpreter
directly; the owner releases the closure on its own thread and frees the CallableT. If
the owner is already gone (alive == false), the CallableT shell is freed directly (the
closure is already gone with the interpreter) and any pending invoke returns a clean error.
11. Build order
- Broker core:
broker.h(the canonicalValueT/AggregateT/CallableT/enums),valueCopy/valueFree/valueMovewith full tag coverage and the depth cap, the name->entry registry,brokerCall, the error convention. Unit-test value round-trips and deep-copy/free under a leak checker before any engine exists. - Lua adapter against the core: trampoline, scalar+string marshalling, table
deep-copy both directions with caps, native registration, export with the refcounted
luaL_reflifecycle and exports-array growth. Test C<->Lua and Lua-export-called-from-C single-threaded. - my-basic adapter: lifecycle, the trampoline bank, the arg-frame protocol,
mb_memdupstring ownership, the collection mapping, and the parked__BROKERSERVEexport frame. Test C<->BASIC and the full Lua<->broker<->BASIC round-trip against the fidelity table (assert the lossy edges error or coerce exactly as documented). - Actor layer:
ScriptContextT, the MPSC queue, the reply future, the id+generation registry, the single error channel, the chosen block-while-waiting semantics (sec 6.1), and shutdown drain. Stress cross-context calls and teardown under a thread sanitizer. - Networking/dispatcher: epoll I/O thread, the FIFO command queue, the socket/timer
natives, completion dispatch onto owning queues, callbacks via
valueFnE. - Squirrel (later): a third adapter validates that the vtable + canonical
ValueTreally make new engines O(1).
12. Open decisions
- Block-while-waiting semantics: DECIDED -- always-live nested pump (sec 6.1).
- Strict-vs-lenient default for the lossy marshal edges (recommend: strict by default so truncation/loss is an explicit error; lenient opt-in per call).
- my-basic native-bank size (cap on natives per BASIC context).
- Whether a foreign function injected into BASIC should be transparently callable as a
routine value (
cb(x)) or only via the portableCALL(fn, ...)primitive (Lua gets the transparent form for free; BASIC's transparent form needs confirming).
13. Implementation notes (as-built: broker core + both adapters)
Built and tested: broker.h/value.c/broker.c (core), luaAdapter.* (Lua 5.4),
mybasicAdapter.* (vendored my-basic in vendor/), with testBroker/testLua/
testMyBasic/testPolyglot -- 378 checks, clean under ASan+UBSan. The polyglot test
proves the thesis: one C native called from both engines, and a Lua function invoked from
a BASIC program through the broker.
Core refinement. The single global callable-release hook could not distinguish a Lua
closure from a my-basic routine, so release is now a per-callable CallableReleaseFnT
passed to callableCreate (design sec 10's "owner releases the closure", just synchronous
for now). Added callableUserData so a release fn can reach its closure handle.
Lua adapter. Context pointer lives in lua_getextraspace. Native bindings are
context-owned {fn,userData} structs referenced by a light-userdata upvalue on one shared
trampoline. A Lua function crossing out becomes a CallableT over a pinned luaL_ref
(released via luaL_unref in the per-callable release fn); transient callback args are
freed automatically because valueFree drops the handle. A CallableT crossing in becomes
a callable userdata with __call/__gc. Lua allocation APIs longjmp on OOM (no NULL
checks). Caveat: release exported callables before luaContextDestroy (the luaL_ref
lives in that state's registry).
my-basic adapter (the high-effort one; these rules were forced by ASan):
- Build with
-DMB_DOUBLE_FLOAT(double reals) and link-lm. - Native signature has no per-call userData and one interpreter userdata slot, so a
macro-generated trampoline bank (
MB_BANK_SIZE) supplies slot-specific entries that recover the binding from the context. mb_register_funcreturns a count: nonzero = success, 0 = failure (inverted vs the usualMB_FUNC_OK == 0).- Ownership is asymmetric and was the main source of bugs (verified against the my-basic
source during adversarial review):
- A popped collection is owned by the consumer (
mb_dispose_valueafter marshalling); a popped string is a borrowed interior pointer (copy, never free). mb_set_collcopies a scalar/string key-value (dispose your copy after) but stores a collection by pointer without a reference -- so a nested collection needs an explicitmb_ref_valuebefore the set, and must then NOT be disposed (the parent owns it).mb_push_valuetransfers a collection, but borrows a string -- a string result must be pushed withmb_push_string(which marks it for lazy destroy), notmb_push_value.mb_eval_routineborrows its arguments (it never frees them), so marshalled routine args are disposed by the caller after the call -- and the return value is marshalled out first, because a routine may return one of those borrowed arguments.- Routine values are not ref-counted (
mb_ref_value/mb_unref_valuecorrupt them); a routine name must be uppercased beforemb_get_routine(BASIC uppercases at parse). - int64 entering BASIC is range-checked to 32-bit
int_t(brokerErrRangeEon overflow).
- A popped collection is owned by the consumer (
- Routine export uses
mb_get_routine(by name) +mb_eval_routine, both of which need a livevoid** l. That cursor only exists inside a native call, so the dispatch stashes it incurrentL; an exported BASIC-routineCallableTis therefore valid only while the context is serving (a native frame is on the stack -- what the actor layer's parked__BROKERSERVEframe will guarantee). For now, fetch and invoke within one native call. - One interpreter per program: a my-basic context hosts a single program; reset+reload after disposing native-pushed collection intermediates is unreliable, so the tests spin a fresh context per run. (The actor layer will own one long-lived parked context per module, which sidesteps this.)
Build/verify. Core compiled strict (-Wconversion -Wsign-conversion); adapters drop
those two (engine headers use wide macros) but keep -Wall -Wextra -Werror. All three
engines are vendored under vendor/ and built from source -- vendor/lua (Lua 5.4.6,
library = src/*.c minus the lua.c/luac.c mains), vendor/mybasic (my-basic),
vendor/squirrel-src (Squirrel 3.2) -- each relaxed and un-sanitized but linked into the
sanitized binaries so cross-boundary heap misuse is still caught. Nothing depends on a
system-installed engine or pkg-config, so the build is reproducible. The Lua platform
define is selected automatically: $(OS) first (Windows sets Windows_NT and has no
uname -> no define / ISO C), else from uname -s (LUA_USE_LINUX + -ldl /
LUA_USE_MACOSX / LUA_USE_POSIX). NB the project is otherwise Unix-only (pthreads,
sanitizers, setarch), so the Windows branch only keeps the define correct.
Function-value lifecycle across threads (sec 10), DONE. callableInvoke and the
final callableRelease are now thread-correct. The core exposes two installable hooks
(callableSetInvokeHook/callableSetReleaseHook, the same pattern as brokerSetRouteHook)
so it stays independent of the actor layer; actorInit installs them. An invoke from a
thread other than the callable's owner is marshalled to the owner's thread by reusing the
CALL machinery (a callable's fn+userData are exactly a native call -- callableFn is
the one new accessor). The final reference drop is routed too: a new messageReleaseE
posts the finalize (fire-and-forget) to the owner, which runs the engine release
(luaL_unref / sq_release) on its own thread. callableFinalize is the shared "run
release + free shell" tail; the core still runs it inline when no actor is present (so the
single-threaded testCallableDead semantics -- a dead callable still runs its release on
last drop -- are preserved). testEngineLua captures a Lua closure on its context's
thread, then invokes and releases it from the main thread; both marshal to the owner,
ASan/TSan-clean. Limit: releasing a callable whose owner context has been destroyed is
the deferred non-quiescent-teardown case (sec 9) -- best-effort inline finalize for now.
JavaScript adapter (Duktape), the fourth engine. Vendored Duktape 2.7.0 (the
single amalgamated duktape.c/duktape.h/duk_config.h) in vendor/duktape, built
relaxed/un-sanitized. src/js/jsAdapter.* mirrors the Lua/Squirrel adapters: one shared
trampoline recovers its binding from a hidden property on the function object
(duk_push_current_function + an internal \xFF-prefixed key) and dispatches through
brokerCall; marshalling covers scalars, binary-safe strings, and the hybrid aggregate
(JS array <-> list, object <-> map) with the depth cap. JS numbers are doubles, so an
integral in-range number round-trips as an int (else a real). A JS function crossing out
becomes a refcounted CallableT over a Duktape heap pointer kept alive by a per-heap
export registry object in the global stash (the slot is dropped on release) -- and it
participates in the sec-10 cross-thread routing, so a JS closure captured on its context's
thread is invoked and released from another thread correctly. src/js/jsEngine.* is the
EngineT binding. testJs (single-threaded: scalar/string/array/object marshalling, export
- invoke-from-C, closure-as-arg callback, error paths) and
testEngineJs(threaded: cross-context call + the sec-10 callback) -- clean under ASan/UBSan and TSan (make tsanjs). Adding the engine touched zero lines of the broker core or actor layer (one adapter TU + one engine-binding TU + Makefile rules), re-confirming the O(1)-engine-add thesis. v1 limit (as with Squirrel): pushing a foreignCallableTinto JS is unsupported.
Source layout. src/ holds the project source (core + actor directly in src/,
one subdir per script language: src/lua, src/mybasic, src/squirrel, src/js);
tests/ holds the test programs; obj/ collects every object file (ours and the vendored
engines, via patsubst into obj/); bin/ collects the binaries. The Makefile finds our
sources by VPATH and groups object rules by flag set; -MMD -MP generate header
dependencies automatically. make clean removes obj/ and bin/. make test builds and
runs all ten binaries; make tsan/make tsansq/make tsanjs are the ThreadSanitizer
variants.
16. Threading model rewrite -- host-thread natives, fire-and-forget scripts
Supersedes the earlier "natives run inline on the calling context thread" model. The
host's own thread is now an implicit host context (id 0): it has a queue but no OS
thread of its own, and the host drives it by calling calogPump in its loop.
- Scripts are fire-and-forget.
calogContextEval(ctx, src)enqueues the script onto the context's thread and returns a status (not the result); the script runs asynchronously, and results come back by calling natives. - A registered native runs on the host thread, serialized. A script calling one
posts a CALL onto the host queue and parks; the host runs it during
calogPump. So host C code is never called concurrently and needs no locking.actorRouteinlines a call already on the host thread; otherwise it marshals to the host context (id 0). calogRegisterInlineis the escape hatch: the registry entry'srunInlineflag (which replacedownerCtxId) makes the native run on the calling script's thread.- Errors from a fire-and-forget script are posted to the host queue and delivered
to the
CalogErrorFnThandler (calogSetErrorHandler) duringcalogPump(default: log to stderr). - Function values (
CalogFnT) still run on their owning engine's thread -- sec 10 routing is unchanged, andcalogFnInvokefrom the host blocks-and-pumps the host queue while it waits (the same nested pump, now applied to the host context). - Nested eval is allowed: a new eval that arrives while a context is mid-script
(parked on a native call) runs nested via
pumpUntil-- consistent with the sec-6.1 re-entrancy contract (interpreters support nestedpcall/peval).
API shape: calogRegister(c,name,fn,ud) / calogRegisterInline(...);
calogContextOpen(c,engine) -> CalogContextT* (create+start merged, since nothing is
registered between them anymore) and calogContextClose; calogContextEval(ctx,src)
fire-and-forget; calogPump; calogSetErrorHandler. CalogConfigT and the
createInterpreter config parameter are gone -- a context now exposes every
registered native (the engine binding walks the registry via the internal
calogForEach). Tests rewritten to drive calog the host way (register, open, eval,
pump-until-a-native-records-the-result); testActor is now engine-free, exercising the
dispatch machinery with C callables (calogFnCreate) on synthetic contexts. Verified:
make test 441 checks across 11 binaries (incl. examples/embed.c), gcc + clang
strict, ASan/UBSan + TSan clean (make tsan/tsansq/tsanjs).
CalogT owns its contexts; ids are unbounded. The active-context registry moved
from context.c file-static globals into struct CalogT (now defined in
calogInternal.h): a runtime owns both its native-function registry and its
active-context registry (ctxMutex, ctxSlots, freelist). context.c reaches it via
one runtime pointer set in calogActorInit (which also refuses a second runtime).
So calogDestroy closes every still-open context automatically -- the host need not
track them (a test opens 32 and never closes them; ASan confirms no leak). Context ids
widened to uint64 (32-bit slot index + 32-bit generation), so neither the live
count nor open/close churn hits a preset ceiling; calogContextId/calogCurrentId
return uint64_t. The now-dead ownerGen parameter was dropped from calogFnCreate
(generation lives in the packed id). Re-verified: make test 473 checks, ASan no
leaks, TSan clean, gcc + clang strict.
Independent runtimes in one process. The one-runtime limit was not fundamental --
just process-global state that hadn't moved into CalogT. All of it now has: the host
context, the routing hooks (routeHook/invokeHook/releaseHook), and the error sink
are CalogT fields; a CalogFnT carries a runtime pointer so calogFnInvoke/release
reach the right hooks (the callable path has no CalogT otherwise). The dispatch
reaches its runtime through the object it already holds -- the route hook is handed its
calog, the callable hooks read calogFnRuntime, context-thread code uses
context->broker, and the rest take an explicit calog argument. The only remaining
process global is currentContext, and it is thread-local (it names the calling
thread's context). So the setters (calogSetRouteHook etc.) are gone -- calogActorInit
assigns the fields directly -- and the runtime static that the earlier review flagged
is deleted. Runtimes are isolated: don't pass a value or callable between them (a
cross-runtime reply cannot route). A test spawns N threads, each creating, driving, and
destroying its own runtime concurrently.
One thread may host several runtimes. calogPump(calog) sets currentContext to
calog's host context for the drain and restores it after, so a single thread can drive
many runtimes by pumping each in turn -- a native serviced during calogPump(A) sees
calogCurrent() == A even if the thread also hosts B. Two consequences fall out and are
handled: context ids number from 1 in every runtime, so the "already on the owner's
thread" (inline) and "caller can take the token/pump path" (reply) decisions match the
runtime too, not the id alone -- a foreign or wrong-runtime caller takes a reply box,
which cannot misroute. A test creates two runtimes on one thread, runs a script in each,
and pumps both in a loop, asserting each runtime's native resolved calogCurrent() to
its own runtime (it fails if the pump doesn't rebind currentContext). Re-verified:
make test 480 checks, ASan no leaks, TSan clean (both concurrent runtimes and one
thread pumping two), gcc + clang strict.
Loading a script by filename. Each engine carries a NULL-terminated extensions
list ({"lua"} / {"js"} / {"nut"} / {"bas"}), and a host makes engines available
for filename-based loading with calogRegisterEngine(calog, &engine).
calogContextLoad(calog, base) then walks the registered engines in registration order
(each engine's extensions in order), forms "<base>.<ext>", and the first one that
fopens wins: it reads the file on the calling thread, opens a context on that engine,
and loads the contents fire-and-forget -- returning the context (NULL if nothing matched
or the load failed). Registration matters for more than search order: hardcoding the
built-in engine vtables in the core would force-link all of them (and their vendored
runtimes) into every binary, defeating the per-engine archives -- so the host opts in,
and a binary that never references an engine pulls in none (testActor stays
engine-free). Engine selection is fundamentally a build-time (link) choice, so
calogRegisterBuiltinEngines (a header-inline in calog.h) registers exactly the
engines whose CALOG_WITH_<ENGINE> macro is set -- the host defines those alongside the
archives it links, and the inline emits nothing (references no engine) unless called, so
it never force-links.
my-basic as an actor engine. Making my-basic loadable meant running it under the
actor model for the first time, which exposed two things. (1) Its native dispatch called
the C function directly instead of through calogCall, so natives ran on the my-basic
context thread rather than marshalling to the host -- fixed by routing mbDispatch
through calogCall (the binding now stores the registry name), matching the other
engines. (2) my-basic keeps process-global state -- lazy mb_init singletons and a
global _mb_allocated counter touched on every allocation (forced on in the vendored
header) -- so two my-basic contexts on different threads race (TSan-confirmed). The
singletons are built once by mb_init and read-only thereafter, so the only
execution-time shared write is that counter; a one-line vendored patch makes it
_Atomic (the original is preserved as vendor/mybasic/myBasic.c.orig). With the
counter safe, the my-basic engine (not the adapter, which stays usable single-threaded
and lock-free) needs a lock only across lifecycle -- mb_init's first-context build,
mb_dispose's last-context teardown, and the shared context refcount -- and NOT across
runSource, so several my-basic scripts execute concurrently. A tsanmb target proves
the parallel case is race-free (verified further by a 4-context stress running
arithmetic, strings, lists, and booleans). Verified: make test 494 checks (13
binaries), ASan no leaks, TSan clean on all four engines
(tsan/tsansq/tsanjs/tsanmb), gcc + clang strict.
15. Public embedding API (calog.h) -- as-built (superseded by sec 16 for threading/API)
calog is packaged as an embedding library: a host links it, registers its own native
C functions, creates script contexts on an engine, and runs scripts. Every public
symbol carries a calog prefix (types Calog...T, enums Calog...E) so the library
is a good citizen in a host binary. The API was curated to the minimum:
- One handle, one header.
CalogTis the runtime;calogCreate()composes the registry with the actor layer (installs the routing hooks) andcalogDestroy()tears both down -- no separate init/shutdown for the host. The entire embedding surface issrc/calog.h(~30 functions); internal machinery (the registry entry type, the route/invoke/release hooks, the low-level callable lifecycle, the splitcalogBrokerCreate/calogActorInit) lives insrc/calogInternal.h, which host code never includes.calog.hleaks no internal symbol. - One config type. The three per-engine configs collapsed into
CalogConfigT(exposeNames+exposeCount), used by every built-in engine vtable. - Value model unchanged, just prefixed.
CalogValueT/CalogAggT/CalogFnT+ constructors (calogValueInt, ...), ops (calogValueCopy/Free/Move/Equals), aggregates (calogAgg*), function values (calogFnInvoke/Retain/Release), andcalogFail/calogTypeNamefor writing natives. Contexts:calogContextCreate/Start/Eval/Destroy/Id, pluscalogCurrentId/calogCurrentfor natives. The built-in engine vtables arecalogLuaEngine,calogJsEngine,calogSquirrelEngine(a host may also supply a customCalogEngineT). - Packaging.
makebuildslib/libcalog.a(calog itself: core + actor + every adapter/binding) and separate vendored-engine archives (liblua.a,libduktape.a,libsquirrel.a,libmybasic.a). A host linkslibcalog.aplus whichever engine archives it uses; unused adapters (and their engine deps) stay unlinked because static members are pulled only when referenced -- so a JS-only host never links Lua/Squirrel. The tests consume the archives; the threaded/engine tests use onlycalog.h, validating that the public surface is complete.examples/embed.cis a ~30-line host (public header only) that registers a native and calls it from JavaScript. - Reconfirmed: rename + restructure kept all 441 checks passing across 10 test binaries, clean under ASan/UBSan and TSan (all four engines), gcc + clang strict.
14. Implementation notes (as-built: actor layer, engine-on-a-thread, Squirrel)
The actor layer (context.h/context.c, build step 4) is built and tested:
testActor exercises cross-context routing, the always-live nested pump (the
re-entrant A->B->A deadlock test), and a concurrent fan-out stress; clean under
ASan+UBSan and ThreadSanitizer (make tsan, run under setarch -R -- some kernels
hand out more mmap randomization than TSan's shadow tolerates). One thread + one
MPSC queue per ScriptContextT; brokerCall routes through an installed hook
(brokerSetRouteHook) so owner-0/same-context calls run inline and others marshal
to the owning thread; an external caller blocks on a private reply box, a context
caller pumps. The reply carries only {status, result} -- the single error channel
(sec 8) is structural, the error string rides in result. contextSendBlocking
and contextReply are the shared enqueue-wait and reply tails behind both CALL and
EVAL dispatch.
Generationed registry (sec 9), DONE. Context ids pack a 16-bit slot index and
a 16-bit generation; the registry is a slot table plus a freelist. contextDestroy
unlinks a context under the registry lock (after stopping+joining its thread) and
returns the slot to the freelist; the next reuse bumps the generation. A stale id
(slot since freed/recycled) resolves to brokerErrDeadE, never misroutes to the
recycler -- testActor's generation test proves it. The registry lock is held
across enqueue, so a foreign enqueue cannot race a destroy onto a freed queue mutex.
Still quiescence-assuming (no call to the context in flight at teardown); in-flight
reference draining is the remaining sec 9 hardening.
Engine on a thread (the EngineT vtable). EngineT gained runSource;
contextEval(context, source, result) marshals a script run onto the context's own
thread (a new messageEvalE) and blocks like a call. Each adapter's engine binding
lives in its own TU (luaEngine.*, squirrelEngine.*) -- the only Lua/Squirrel
files that depend on the threading layer, keeping the adapters thread-agnostic so
testLua/testPolyglot link them without context.o/pthread. createInterpreter
runs on the thread and exposes the configured natives there. Crucially, the exposed-
native trampolines now dispatch through brokerCall (by broker+name) instead of a
captured fn pointer, so an exposed native owned by another context is transparently
routed to its thread -- the script author still writes doubleIt(21). With no route
hook installed this is identical to the old inline path (testLua still passes).
testEngineLua proves a real Lua interpreter on a context thread calling a thread-
agnostic native and a cross-context native, on the correct threads.
Squirrel adapter (sec 11 step 6), the O(1)-engine-add validation. Vendored
Squirrel 3.2 in vendor/squirrel-src (C++), built relaxed/un-sanitized with
-D_SQ64 -DSQUSEDOUBLE so SQInteger/SQFloat are 64-bit int / double matching
ValueT -- the adapter shares those defines so the ABI matches. squirrelAdapter.*
mirrors the Lua adapter: one shared trampoline recovers its binding from the
closure's single free variable (which the VM pushes onto the stack after the args,
so it sits at the top -- verified in sqvm.cpp CallNative), marshals scalars,
binary-safe strings, and the hybrid aggregate (array<->list, table<->map) with the
shared depth cap, and dispatches through brokerCall. testEngineSquirrel runs a
real VM on a thread doing the cross-context call plus string and array round-trips;
clean under ASan+UBSan and TSan (make tsansq). The total surface a new engine
added: one adapter TU + one engine-binding TU + Makefile rules -- no change to the
broker core or the actor layer, which is the thesis.
Squirrel closure export, DONE. A Squirrel closure crossing the boundary now
becomes a refcounted CallableT over a pinned HSQOBJECT (sq_addref/sq_release,
mirroring Lua's luaL_ref lifecycle): squirrelExport fetches a named global
closure, and a closure passed as a native argument is exported the same way during
ingress (the VM's foreign pointer -- finally used -- recovers the owning context).
squirrelCallableInvoke runs sq_pushobject+sq_call on the owner's VM and
marshals the return; squirrelCallableRelease sq_releases on the owner thread.
Single-threaded testSquirrel covers export+invoke-from-C, a closure passed as an
argument and called back through the broker, and the not-found/type-error paths;
ASan-clean (no addref/release leak). Caveat (same as Lua): release exported
callables before squirrelContextDestroy. Remaining limit: the reverse direction
(a foreign CallableT pushed INTO Squirrel so a script can call it) returns
brokerErrUnsupportedE -- Squirrel has no clean callable-userdata-with-__gc like
Lua, so it needs a class instance with a _call metamethod + release hook.
make test runs all seven binaries (411 checks). make tsan covers the actor core
and the Lua engine path; make tsansq the Squirrel path.
Adversarial review (3 parallel reviewers: actor concurrency, Squirrel adapter, Lua trampoline + engine bindings). Two real defects found and fixed:
- NULL-interpreter crash.
threadMainignorescreateInterpreter's status, so a failed create (e.g. a config expose-name that was never registered) left a context serving withinterp == NULL;contextDispatchEvalonly checkedrunSource != NULL, so the first eval calledrunSource(NULL,...)-> NULL deref. Fixed by guardinginterp == NULL(the context still serves native calls, just rejects evals); regression test intestEngineLua(testFailedInterpreter). - OOM lost-wakeup.
contextReply's context-caller branch allocated a fresh REPLY and, oncallocfailure, dropped the wakeup -- the caller hung inpumpUntilforever. Fixed by reusing the request message as the reply (it already carries the token and replyToId), which removes the allocation entirely, so the wakeup can no longer be lost to OOM. The Squirrel adapter was traced clean against the real Squirrel source (trampoline free-var indexing, stack balance, ValueT ownership, the throwerror/free order, binary strings); addedsq_reservestackguards before the recursive marshallers to match the Lua adapter'slua_checkstackdiscipline. Documented (not changed): the registry must be frozen before contexts start (brokerCallreads it locklessly from context threads -- noted in broker.h); teardown still assumes quiescence (sec 9); and the hybrid-aggregate-to-Squirrel-table egress flattens array indices and integer keys into one table (same lossy edge as elsewhere in the fidelity table).