sc-dev-deploy/docs/peering.md
2026-06-01 16:43:43 -05:00

16 KiB

dev-deploy peering

How two dev-deploy instances find, authenticate, and exchange ops with each other: the peer model, the pairing flow, the HMAC wire protocol, the sync anchors that drive promote and pull, and how a standalone instance pairs with a single tenant on a multi-tenant server.

See also: architecture.md for the ops journal, stable UUIDs, and the apply pipeline; the README for the full table and endpoint inventory.

Contents

The peer model

A peer is one row in _dd_peers (defined in lib/schema.js). Each instance stores a row per peer it talks to; the relationship is configured independently on both sides (there is no central registry).

Column Type Meaning
peer_id serial / integer PK Local surrogate id (auto-assigned).
env_id TEXT UNIQUE The peer's dev-deploy env_id (the other side's _dd_env.env_id).
label TEXT Optional human label (e.g. test, prod).
base_url TEXT Where to reach the peer (e.g. http://localhost:3001 or https://tenant.example.com).
peer_secret_ciphertext TEXT Sealed shared secret (hex).
peer_secret_iv TEXT AES-GCM IV (hex).
peer_secret_tag TEXT AES-GCM auth tag (hex).
require_tls INTEGER TLS-required flag (stored as 0/1).
created_at TEXT ISO 8601 creation time.
last_seen_at TEXT ISO 8601 of the last verified inbound request from this peer; null until first contact.

The sealed shared secret

The shared secret is 32 random bytes (randomSecret(), lib/crypto.js). It is never stored in plaintext. At rest it is sealed with AES-256-GCM (seal(), lib/crypto.js) and split across the three peer_secret_* columns as hex. The hex-text storage is deliberate: Saltcorn's SQLite insert layer JSON-stringifies object values, which would mangle a raw Buffer column (lib/schema.js).

The 32-byte key-encryption key (KEK) used by seal/open is derived once per process via HKDF-SHA256 from SALTCORN_SESSION_SECRET (getKek(), lib/crypto.js; falls back to the Saltcorn session_secret config). Because the KEK is tied to the session secret, rotating SALTCORN_SESSION_SECRET invalidates every stored pairing -- existing ciphertexts no longer decrypt (documented in lib/crypto.js).

Plaintext only crosses the process boundary at two moments:

  • At pairing time, when the operator copies the secret into the other side's pairing form.
  • At HMAC sign/verify time, when peerSecret() (lib/peers.js) opens the sealed bytes to compute or check a signature.

rowToPeer() (lib/peers.js) deliberately omits the sealed columns from the plain accessor; callers must go through peerSecret() / peerSecretByEnvId().

Pairing flow

Pairing is symmetric: each side ends up with a _dd_peers row pointing at the other side's env_id and base_url, and both rows seal the same shared secret. One side generates the secret; the operator pastes it into the other.

Each instance's own env_id is shown on its Peers page (peersView, lib/routes.js): "This instance's env_id is ... Paste this into the other instance's peer form." The env_id itself is a random UUID minted once at bootstrap (lib/env.js).

Steps:

  1. On instance A, open /admin/dev-deploy/peers and submit the Add peer form (peersAdd, lib/routes.js) with the peer's env_id (B's), an optional label, B's base_url, and an optional require_tls checkbox. Leave Existing secret blank.
  2. addPeer (lib/peers.js) generates a fresh 32-byte secret, seals it, and inserts the row. The plaintext secret is rendered once as 64 hex characters on the confirmation page (lib/routes.js) -- "it will not be shown again."
  3. On instance B, open its own Peers page and submit Add peer with A's env_id, A's base_url, and paste the 64-hex secret into the Existing secret field. peersAdd validates it against /^[0-9a-fA-F]{64}$/ (lib/routes.js) and passes it to addPeer as existingSecret, so B seals the identical secret rather than generating a new one.

After both rows exist, A and B share one secret and each knows the other's env_id and base_url.

env_id is enforced UNIQUE, so re-adding the same peer fails with "peer with env_id ... already exists" (lib/peers.js).

Rotation and deletion

  • Rotate (peersRotate, lib/routes.js -> rotatePeerSecret, lib/peers.js) mints a new secret for an existing peer, re-seals it, and shows the new value once. The operator must paste the new secret on the other side (re-pair or rotate there) or the pairing breaks.
  • Delete (peersDelete, lib/routes.js -> deletePeer, lib/peers.js) removes the _dd_peers row and deletes that peer's _dd_anchors rows, so a later re-pair starts syncing from the epoch again.

The HMAC wire protocol

Every machine-API request is signed with the shared secret using HMAC-SHA256. The outbound side is lib/transport.js; the inbound check is requirePeerAuth (lib/peerAuth.js).

Headers

Header Source Meaning
X-DD-Env-Id sender's own env_id Caller identity; the receiver looks it up in _dd_peers via findPeerByEnvId to find the matching secret.
X-DD-Timestamp String(Date.now()) Milliseconds since epoch.
X-DD-Nonce randomNonce().toString("hex") 16 random bytes, hex (replay padding).
X-DD-Signature sign(secret, canonical) Hex HMAC-SHA256 over the canonical string.

All four headers are required; a missing one returns 400 missing header ... (lib/peerAuth.js, lib/peerAuth.js).

When there is a request body, the sender sets Content-Type: application/vnd.dev-deploy+json (lib/transport.js). This custom type stops Saltcorn's express.json() middleware from consuming the request stream, so the receiver can read the exact raw bytes and HMAC them verbatim -- no re-serialization, no whitespace or key-order assumptions (lib/peerAuth.js, lib/peerAuth.js).

The canonical string

Both sides build the signed string with buildCanonical (lib/crypto.js). It is six fields joined by newlines (\n):

timestamp
nonce
METHOD
path
targetHost
sha256hex(body)
  • METHOD is uppercased.
  • path is the request path including query string. Outbound it is the literal path argument; inbound it is req.originalUrl || req.url (lib/peerAuth.js).
  • body is hashed with SHA-256 (sha256Hex, lib/crypto.js); an empty body hashes the empty string. GET/HEAD never have a body (lib/peerAuth.js).

Host binding (anti-cross-tenant replay)

targetHost is the normalized host the request is aimed at, and binding it into the signature is what stops a request signed for one tenant from being replayed against another tenant on the same multi-tenant server.

  • Outbound, the host is derived from the peer's base_url: normalizeHost(new URL(baseUrl).host) (lib/transport.js).
  • Inbound, it is derived from the request: prefer X-Forwarded-Host (first value, set by a trusted proxy), else the Host header, then normalized the same way (lib/peerAuth.js to lib/peerAuth.js).

normalizeHost (lib/crypto.js) lowercases, trims, and drops a trailing :80 or :443 so both sides produce byte-identical strings (clients omit the default port from the Host header). Because the canonical includes targetHost, a signature computed for t1.example.com will not verify when the same bytes are re-sent to t2.example.com: the receiver rebuilds the canonical with its own host, the MAC differs, and verification fails with 401 bad signature.

Note (lib/peerAuth.js): the receiver derives the host from the request, NOT from peerRow.base_url. Inbound, base_url is the sender's address (used for pull-back), not the receiver's own host.

Verification order

requirePeerAuth (lib/peerAuth.js) checks, in order, and returns null (after sending a 4xx) on the first failure:

  1. All four required headers present, else 400.
  2. Timestamp within the +/- 5 minute skew window (timestampWithinSkew, lib/crypto.js; SKEW_TOLERANCE_MS = 5 * 60 * 1000, lib/crypto.js), else 401 timestamp out of skew window.
  3. X-DD-Env-Id resolves to a _dd_peers row, else 401 unknown peer env_id.
  4. The peer has a sealed secret that opens, else 401 peer not provisioned.
  5. Signature matches via constant-time compare (verifySignature, lib/crypto.js, uses crypto.timingSafeEqual), else 401 bad signature.
  6. If there was a body, it parses as JSON (after the signature already covered the raw bytes), else 400 body is not valid JSON.

On success it parses the body into req.body, advances the peer's last_seen_at (touchPeerLastSeen, lib/peers.js), sets req.dd_peer to the peer row, and returns it.

The nonce is sent and signed but the current code does not maintain a server-side seen-nonce cache; replay protection rests on the skew window and the host binding. (Stated to avoid over-claiming; no nonce store exists in the code read.)

Promote, pull, and anchors

Sync direction is per peer and per direction, tracked in _dd_anchors (lib/schema.js):

Column Meaning
peer_id FK-by-convention to _dd_peers.peer_id (PK part).
direction outbound or inbound (PK part).
last_op_id The last op id synced in that direction for that peer.
updated_at ISO 8601 of the last advance.

PRIMARY KEY (peer_id, direction) means at most one outbound and one inbound watermark per peer.

Both promote and pull select only ops authored by the local env (source_env_id = env.env_id) and only those created_at > the anchor op's created_at. If there is no anchor, sync starts from the epoch (the whole journal). Helpers: getOutboundAnchor / getInboundAnchor / upsertAnchor (lib/routes.js to lib/routes.js).

Promote (push ops to a peer)

promote (lib/routes.js):

  1. Look up the peer and the local env; read the outbound anchor.
  2. Select the local env's ops after the anchor, oldest first, LIMIT 500 (lib/routes.js). If none, redirect with "no ops to promote".
  3. signedFetch POST /dev-deploy/api/ingest with { ops } and the peer's secret (lib/routes.js).
  4. On success, advance the outbound anchor to the last op's op_id (upsertAnchor(peerId, "outbound", ...), lib/routes.js).
  5. Summarize applied/error counts from the response and append any plugin- version warnings from diffPluginsWithPeer (lib/routes.js, which calls /dev-deploy/api/health).

planView (lib/routes.js) is the dry run: same anchor-relative selection (LIMIT 500) but rendered as a preview table instead of being sent.

The receiving side, apiIngest (lib/routes.js), authenticates, applies the batch with applyBatch, and advances its inbound anchor for the sender to the last received op_id (lib/routes.js).

Pull (fetch a peer's ops)

pull (lib/routes.js):

  1. Read the inbound anchor; build the path /dev-deploy/api/journal?since=<last_op_id> (or no since if no anchor) (lib/routes.js).
  2. signedFetch GET that path (lib/routes.js).
  3. Apply the returned ops with applyBatch (lib/routes.js).
  4. Advance the inbound anchor to the last pulled op's op_id (lib/routes.js).
  5. Summarize applied/error/conflict counts and plugin warnings.

The serving side, apiJournal (lib/routes.js), returns the local env's ops after since (resolved to the op's created_at), oldest first, LIMIT 1000 (lib/routes.js), as { source_env_id, ops }.

Mixed-topology peering

A standalone instance and a specific tenant on a multi-tenant server peer the same way as two standalone instances; the only difference is the base_url.

  • Address the tenant by its tenant hostname as base_url, e.g. https://tenant.example.com. Saltcorn routes the request to that tenant by host, and dev-deploy's tables are schema-qualified per tenant (db.getTenantSchemaPrefix() is used throughout, e.g. lib/routes.js), so the peer row, ops, and anchors all live in that tenant's schema.
  • The host binding makes this safe: the signature is computed over the tenant hostname (outbound from base_url; inbound from X-Forwarded-Host / Host). A request signed for one tenant cannot be replayed against another tenant on the same server, because each tenant's host produces a different canonical string (see Host binding).
  • Each side still stores the other's env_id and base_url in its own _dd_peers. A standalone instance points base_url at the tenant's hostname; the tenant points base_url back at the standalone instance's hostname.

If a reverse proxy fronts the tenants, it must set X-Forwarded-Host to the tenant hostname so the inbound canonical matches the outbound one (lib/peerAuth.js).

Endpoint reference

All four machine-API routes are registered with noCsrf: true (lib/routes.js to lib/routes.js) and require HMAC peer auth via requirePeerAuth. The admin peer/sync routes require a session with admin role (role_id === 1, isAdmin, lib/routes.js) and use CSRF fields.

Machine API (HMAC peer auth)

Method Path Handler File:line Purpose
GET /dev-deploy/api/journal?since=op_id apiJournal lib/routes.js Return local env ops after since, oldest first, max 1000. Returns { source_env_id, ops }.
POST /dev-deploy/api/ingest apiIngest lib/routes.js Apply { ops } from a peer; advance that peer's inbound anchor. Returns { received, results }.
GET /dev-deploy/api/file/:uuid apiFile lib/routes.js Stream a file entity's bytes by UUID (octet-stream). 404 if no _dd_entity_ids mapping for kind file.
GET /dev-deploy/api/health apiHealth lib/routes.js Return { env_id, label, plugins } for plugin-drift checks.

Admin peer and sync routes (session + admin role)

Method Path Handler File:line Purpose
GET /admin/dev-deploy/peers peersView lib/routes.js List peers, show this env's env_id, add-peer form.
POST /admin/dev-deploy/peers/add peersAdd lib/routes.js Pair a peer; generate or accept a 64-hex secret.
POST /admin/dev-deploy/peers/rotate peersRotate lib/routes.js Rotate a peer's shared secret (shown once).
POST /admin/dev-deploy/peers/delete peersDelete lib/routes.js Delete a peer and its anchors.
GET /admin/dev-deploy/plan planView lib/routes.js Preview ops that would be promoted to a peer.
POST /admin/dev-deploy/promote promote lib/routes.js Push outbound ops to a peer via signed ingest.
POST /admin/dev-deploy/pull pull lib/routes.js Pull a peer's ops via signed journal and apply them.

File reference

File Responsibility
lib/peers.js _dd_peers CRUD; seal/open the shared secret; peerSecret, addPeer, rotatePeerSecret, deletePeer, touchPeerLastSeen.
lib/crypto.js AES-256-GCM seal/open, HKDF KEK, HMAC sign/verify, buildCanonical, normalizeHost, skew check, random secret/nonce.
lib/transport.js Outbound signed requests: signedFetch (JSON) and signedFetchBinary (raw bytes).
lib/peerAuth.js Inbound requirePeerAuth: header check, skew, peer lookup, raw-body HMAC verify, host binding.
lib/routes.js Admin UI for pairing/plan/promote/pull and the four machine-API handlers.
lib/schema.js _dd_peers (:38) and _dd_anchors (:116) table definitions.