sc-dev-deploy/docs/peering.md
2026-06-01 16:43:43 -05:00

328 lines
16 KiB
Markdown

# dev-deploy peering
How two dev-deploy instances find, authenticate, and exchange ops with each
other: the peer model, the pairing flow, the HMAC wire protocol, the sync
anchors that drive promote and pull, and how a standalone instance pairs with a
single tenant on a multi-tenant server.
See also: [architecture.md](architecture.md) for the ops journal, stable UUIDs,
and the apply pipeline; the [README](../README.md) for the full table and
endpoint inventory.
## Contents
- [The peer model](#the-peer-model)
- [Pairing flow](#pairing-flow)
- [The HMAC wire protocol](#the-hmac-wire-protocol)
- [Promote, pull, and anchors](#promote-pull-and-anchors)
- [Mixed-topology peering](#mixed-topology-peering)
- [Endpoint reference](#endpoint-reference)
- [File reference](#file-reference)
## The peer model
A peer is one row in `_dd_peers` (defined in `lib/schema.js`). Each instance
stores a row per peer it talks to; the relationship is configured independently
on both sides (there is no central registry).
| Column | Type | Meaning |
| --- | --- | --- |
| `peer_id` | `serial` / `integer` PK | Local surrogate id (auto-assigned). |
| `env_id` | `TEXT` UNIQUE | The peer's dev-deploy `env_id` (the other side's `_dd_env.env_id`). |
| `label` | `TEXT` | Optional human label (e.g. `test`, `prod`). |
| `base_url` | `TEXT` | Where to reach the peer (e.g. `http://localhost:3001` or `https://tenant.example.com`). |
| `peer_secret_ciphertext` | `TEXT` | Sealed shared secret (hex). |
| `peer_secret_iv` | `TEXT` | AES-GCM IV (hex). |
| `peer_secret_tag` | `TEXT` | AES-GCM auth tag (hex). |
| `require_tls` | `INTEGER` | TLS-required flag (stored as 0/1). |
| `created_at` | `TEXT` | ISO 8601 creation time. |
| `last_seen_at` | `TEXT` | ISO 8601 of the last verified inbound request from this peer; `null` until first contact. |
### The sealed shared secret
The shared secret is 32 random bytes (`randomSecret()`, `lib/crypto.js`). It
is never stored in plaintext. At rest it is sealed with AES-256-GCM
(`seal()`, `lib/crypto.js`) and split across the three `peer_secret_*`
columns as hex. The hex-text storage is deliberate: Saltcorn's SQLite insert
layer JSON-stringifies object values, which would mangle a raw `Buffer` column
(`lib/schema.js`).
The 32-byte key-encryption key (KEK) used by `seal`/`open` is derived once per
process via HKDF-SHA256 from `SALTCORN_SESSION_SECRET` (`getKek()`,
`lib/crypto.js`; falls back to the Saltcorn `session_secret` config). Because
the KEK is tied to the session secret, rotating `SALTCORN_SESSION_SECRET`
invalidates every stored pairing -- existing ciphertexts no longer decrypt
(documented in `lib/crypto.js`).
Plaintext only crosses the process boundary at two moments:
- At pairing time, when the operator copies the secret into the other side's
pairing form.
- At HMAC sign/verify time, when `peerSecret()` (`lib/peers.js`) opens the
sealed bytes to compute or check a signature.
`rowToPeer()` (`lib/peers.js`) deliberately omits the sealed columns from the
plain accessor; callers must go through `peerSecret()` / `peerSecretByEnvId()`.
## Pairing flow
Pairing is symmetric: each side ends up with a `_dd_peers` row pointing at the
other side's `env_id` and `base_url`, and both rows seal the *same* shared
secret. One side generates the secret; the operator pastes it into the other.
Each instance's own `env_id` is shown on its Peers page (`peersView`,
`lib/routes.js`): "This instance's env_id is ... Paste this into the other
instance's peer form." The `env_id` itself is a random UUID minted once at
bootstrap (`lib/env.js`).
Steps:
1. On instance A, open `/admin/dev-deploy/peers` and submit the **Add peer**
form (`peersAdd`, `lib/routes.js`) with the peer's `env_id` (B's), an
optional `label`, B's `base_url`, and an optional `require_tls` checkbox.
Leave **Existing secret** blank.
2. `addPeer` (`lib/peers.js`) generates a fresh 32-byte secret, seals it, and
inserts the row. The plaintext secret is rendered once as 64 hex characters
on the confirmation page (`lib/routes.js`) -- "it will not be shown
again."
3. On instance B, open its own Peers page and submit **Add peer** with A's
`env_id`, A's `base_url`, and paste the 64-hex secret into the **Existing
secret** field. `peersAdd` validates it against `/^[0-9a-fA-F]{64}$/`
(`lib/routes.js`) and passes it to `addPeer` as `existingSecret`, so B
seals the identical secret rather than generating a new one.
After both rows exist, A and B share one secret and each knows the other's
`env_id` and `base_url`.
`env_id` is enforced UNIQUE, so re-adding the same peer fails with "peer with
env_id ... already exists" (`lib/peers.js`).
### Rotation and deletion
- **Rotate** (`peersRotate`, `lib/routes.js` -> `rotatePeerSecret`,
`lib/peers.js`) mints a new secret for an existing peer, re-seals it, and
shows the new value once. The operator must paste the new secret on the other
side (re-pair or rotate there) or the pairing breaks.
- **Delete** (`peersDelete`, `lib/routes.js` -> `deletePeer`,
`lib/peers.js`) removes the `_dd_peers` row *and* deletes that peer's
`_dd_anchors` rows, so a later re-pair starts syncing from the epoch again.
## The HMAC wire protocol
Every machine-API request is signed with the shared secret using HMAC-SHA256.
The outbound side is `lib/transport.js`; the inbound check is `requirePeerAuth`
(`lib/peerAuth.js`).
### Headers
| Header | Source | Meaning |
| --- | --- | --- |
| `X-DD-Env-Id` | sender's own `env_id` | Caller identity; the receiver looks it up in `_dd_peers` via `findPeerByEnvId` to find the matching secret. |
| `X-DD-Timestamp` | `String(Date.now())` | Milliseconds since epoch. |
| `X-DD-Nonce` | `randomNonce().toString("hex")` | 16 random bytes, hex (replay padding). |
| `X-DD-Signature` | `sign(secret, canonical)` | Hex HMAC-SHA256 over the canonical string. |
All four headers are required; a missing one returns `400 missing header ...`
(`lib/peerAuth.js`, `lib/peerAuth.js`).
When there is a request body, the sender sets
`Content-Type: application/vnd.dev-deploy+json` (`lib/transport.js`). This
custom type stops Saltcorn's `express.json()` middleware from consuming the
request stream, so the receiver can read the exact raw bytes and HMAC them
verbatim -- no re-serialization, no whitespace or key-order assumptions
(`lib/peerAuth.js`, `lib/peerAuth.js`).
### The canonical string
Both sides build the signed string with `buildCanonical` (`lib/crypto.js`).
It is six fields joined by newlines (`\n`):
```
timestamp
nonce
METHOD
path
targetHost
sha256hex(body)
```
- `METHOD` is uppercased.
- `path` is the request path including query string. Outbound it is the literal
`path` argument; inbound it is `req.originalUrl || req.url`
(`lib/peerAuth.js`).
- `body` is hashed with SHA-256 (`sha256Hex`, `lib/crypto.js`); an empty body
hashes the empty string. GET/HEAD never have a body
(`lib/peerAuth.js`).
### Host binding (anti-cross-tenant replay)
`targetHost` is the normalized host the request is aimed at, and binding it into
the signature is what stops a request signed for one tenant from being replayed
against another tenant on the same multi-tenant server.
- Outbound, the host is derived from the peer's `base_url`:
`normalizeHost(new URL(baseUrl).host)` (`lib/transport.js`).
- Inbound, it is derived from the request: prefer `X-Forwarded-Host` (first
value, set by a trusted proxy), else the `Host` header, then normalized the
same way (`lib/peerAuth.js` to `lib/peerAuth.js`).
`normalizeHost` (`lib/crypto.js`) lowercases, trims, and drops a trailing
`:80` or `:443` so both sides produce byte-identical strings (clients omit the
default port from the `Host` header). Because the canonical includes
`targetHost`, a signature computed for `t1.example.com` will not verify when the
same bytes are re-sent to `t2.example.com`: the receiver rebuilds the canonical
with its own host, the MAC differs, and verification fails with
`401 bad signature`.
Note (`lib/peerAuth.js`): the receiver derives the host from the request, NOT
from `peerRow.base_url`. Inbound, `base_url` is the *sender's* address (used for
pull-back), not the receiver's own host.
### Verification order
`requirePeerAuth` (`lib/peerAuth.js`) checks, in order, and returns `null`
(after sending a 4xx) on the first failure:
1. All four required headers present, else `400`.
2. Timestamp within the +/- 5 minute skew window
(`timestampWithinSkew`, `lib/crypto.js`; `SKEW_TOLERANCE_MS = 5 * 60 *
1000`, `lib/crypto.js`), else `401 timestamp out of skew window`.
3. `X-DD-Env-Id` resolves to a `_dd_peers` row, else
`401 unknown peer env_id`.
4. The peer has a sealed secret that opens, else `401 peer not provisioned`.
5. Signature matches via constant-time compare (`verifySignature`,
`lib/crypto.js`, uses `crypto.timingSafeEqual`), else
`401 bad signature`.
6. If there was a body, it parses as JSON (after the signature already covered
the raw bytes), else `400 body is not valid JSON`.
On success it parses the body into `req.body`, advances the peer's
`last_seen_at` (`touchPeerLastSeen`, `lib/peers.js`), sets `req.dd_peer` to
the peer row, and returns it.
The nonce is sent and signed but the current code does not maintain a
server-side seen-nonce cache; replay protection rests on the skew window and the
host binding. (Stated to avoid over-claiming; no nonce store exists in the code
read.)
## Promote, pull, and anchors
Sync direction is per peer and per direction, tracked in `_dd_anchors`
(`lib/schema.js`):
| Column | Meaning |
| --- | --- |
| `peer_id` | FK-by-convention to `_dd_peers.peer_id` (PK part). |
| `direction` | `outbound` or `inbound` (PK part). |
| `last_op_id` | The last op id synced in that direction for that peer. |
| `updated_at` | ISO 8601 of the last advance. |
`PRIMARY KEY (peer_id, direction)` means at most one outbound and one inbound
watermark per peer.
Both promote and pull select only ops authored by the *local* env
(`source_env_id = env.env_id`) and only those `created_at >` the anchor op's
`created_at`. If there is no anchor, sync starts from the epoch (the whole
journal). Helpers: `getOutboundAnchor` / `getInboundAnchor` / `upsertAnchor`
(`lib/routes.js` to `lib/routes.js`).
### Promote (push ops to a peer)
`promote` (`lib/routes.js`):
1. Look up the peer and the local env; read the outbound anchor.
2. Select the local env's ops after the anchor, oldest first, `LIMIT 500`
(`lib/routes.js`). If none, redirect with "no ops to promote".
3. `signedFetch` `POST /dev-deploy/api/ingest` with `{ ops }` and the peer's
secret (`lib/routes.js`).
4. On success, advance the outbound anchor to the last op's `op_id`
(`upsertAnchor(peerId, "outbound", ...)`, `lib/routes.js`).
5. Summarize applied/error counts from the response and append any plugin-
version warnings from `diffPluginsWithPeer` (`lib/routes.js`, which calls
`/dev-deploy/api/health`).
`planView` (`lib/routes.js`) is the dry run: same anchor-relative selection
(`LIMIT 500`) but rendered as a preview table instead of being sent.
The receiving side, `apiIngest` (`lib/routes.js`), authenticates, applies
the batch with `applyBatch`, and advances *its* `inbound` anchor for the sender
to the last received `op_id` (`lib/routes.js`).
### Pull (fetch a peer's ops)
`pull` (`lib/routes.js`):
1. Read the inbound anchor; build the path
`/dev-deploy/api/journal?since=<last_op_id>` (or no `since` if no anchor)
(`lib/routes.js`).
2. `signedFetch` `GET` that path (`lib/routes.js`).
3. Apply the returned `ops` with `applyBatch` (`lib/routes.js`).
4. Advance the inbound anchor to the last pulled op's `op_id`
(`lib/routes.js`).
5. Summarize applied/error/conflict counts and plugin warnings.
The serving side, `apiJournal` (`lib/routes.js`), returns the local env's
ops after `since` (resolved to the op's `created_at`), oldest first, `LIMIT
1000` (`lib/routes.js`), as `{ source_env_id, ops }`.
## Mixed-topology peering
A standalone instance and a specific tenant on a multi-tenant server peer the
same way as two standalone instances; the only difference is the `base_url`.
- Address the tenant by its tenant hostname as `base_url`, e.g.
`https://tenant.example.com`. Saltcorn routes the request to that tenant by
host, and dev-deploy's tables are schema-qualified per tenant
(`db.getTenantSchemaPrefix()` is used throughout, e.g. `lib/routes.js`),
so the peer row, ops, and anchors all live in that tenant's schema.
- The host binding makes this safe: the signature is computed over the tenant
hostname (outbound from `base_url`; inbound from `X-Forwarded-Host` / `Host`).
A request signed for one tenant cannot be replayed against another tenant on
the same server, because each tenant's host produces a different canonical
string (see [Host binding](#host-binding-anti-cross-tenant-replay)).
- Each side still stores the other's `env_id` and `base_url` in its own
`_dd_peers`. A standalone instance points `base_url` at the tenant's hostname;
the tenant points `base_url` back at the standalone instance's hostname.
If a reverse proxy fronts the tenants, it must set `X-Forwarded-Host` to the
tenant hostname so the inbound canonical matches the outbound one
(`lib/peerAuth.js`).
## Endpoint reference
All four machine-API routes are registered with `noCsrf: true`
(`lib/routes.js` to `lib/routes.js`) and require HMAC peer auth via
`requirePeerAuth`. The admin peer/sync routes require a session with admin role
(`role_id === 1`, `isAdmin`, `lib/routes.js`) and use CSRF fields.
### Machine API (HMAC peer auth)
| Method | Path | Handler | File:line | Purpose |
| --- | --- | --- | --- | --- |
| GET | `/dev-deploy/api/journal?since=op_id` | `apiJournal` | `lib/routes.js` | Return local env ops after `since`, oldest first, max 1000. Returns `{ source_env_id, ops }`. |
| POST | `/dev-deploy/api/ingest` | `apiIngest` | `lib/routes.js` | Apply `{ ops }` from a peer; advance that peer's inbound anchor. Returns `{ received, results }`. |
| GET | `/dev-deploy/api/file/:uuid` | `apiFile` | `lib/routes.js` | Stream a file entity's bytes by UUID (octet-stream). 404 if no `_dd_entity_ids` mapping for kind `file`. |
| GET | `/dev-deploy/api/health` | `apiHealth` | `lib/routes.js` | Return `{ env_id, label, plugins }` for plugin-drift checks. |
### Admin peer and sync routes (session + admin role)
| Method | Path | Handler | File:line | Purpose |
| --- | --- | --- | --- | --- |
| GET | `/admin/dev-deploy/peers` | `peersView` | `lib/routes.js` | List peers, show this env's `env_id`, add-peer form. |
| POST | `/admin/dev-deploy/peers/add` | `peersAdd` | `lib/routes.js` | Pair a peer; generate or accept a 64-hex secret. |
| POST | `/admin/dev-deploy/peers/rotate` | `peersRotate` | `lib/routes.js` | Rotate a peer's shared secret (shown once). |
| POST | `/admin/dev-deploy/peers/delete` | `peersDelete` | `lib/routes.js` | Delete a peer and its anchors. |
| GET | `/admin/dev-deploy/plan` | `planView` | `lib/routes.js` | Preview ops that would be promoted to a peer. |
| POST | `/admin/dev-deploy/promote` | `promote` | `lib/routes.js` | Push outbound ops to a peer via signed `ingest`. |
| POST | `/admin/dev-deploy/pull` | `pull` | `lib/routes.js` | Pull a peer's ops via signed `journal` and apply them. |
## File reference
| File | Responsibility |
| --- | --- |
| `lib/peers.js` | `_dd_peers` CRUD; seal/open the shared secret; `peerSecret`, `addPeer`, `rotatePeerSecret`, `deletePeer`, `touchPeerLastSeen`. |
| `lib/crypto.js` | AES-256-GCM seal/open, HKDF KEK, HMAC sign/verify, `buildCanonical`, `normalizeHost`, skew check, random secret/nonce. |
| `lib/transport.js` | Outbound signed requests: `signedFetch` (JSON) and `signedFetchBinary` (raw bytes). |
| `lib/peerAuth.js` | Inbound `requirePeerAuth`: header check, skew, peer lookup, raw-body HMAC verify, host binding. |
| `lib/routes.js` | Admin UI for pairing/plan/promote/pull and the four machine-API handlers. |
| `lib/schema.js` | `_dd_peers` (`:38`) and `_dd_anchors` (`:116`) table definitions. |