328 lines
16 KiB
Markdown
328 lines
16 KiB
Markdown
# dev-deploy peering
|
|
|
|
How two dev-deploy instances find, authenticate, and exchange ops with each
|
|
other: the peer model, the pairing flow, the HMAC wire protocol, the sync
|
|
anchors that drive promote and pull, and how a standalone instance pairs with a
|
|
single tenant on a multi-tenant server.
|
|
|
|
See also: [architecture.md](architecture.md) for the ops journal, stable UUIDs,
|
|
and the apply pipeline; the [README](../README.md) for the full table and
|
|
endpoint inventory.
|
|
|
|
## Contents
|
|
|
|
- [The peer model](#the-peer-model)
|
|
- [Pairing flow](#pairing-flow)
|
|
- [The HMAC wire protocol](#the-hmac-wire-protocol)
|
|
- [Promote, pull, and anchors](#promote-pull-and-anchors)
|
|
- [Mixed-topology peering](#mixed-topology-peering)
|
|
- [Endpoint reference](#endpoint-reference)
|
|
- [File reference](#file-reference)
|
|
|
|
## The peer model
|
|
|
|
A peer is one row in `_dd_peers` (defined in `lib/schema.js`). Each instance
|
|
stores a row per peer it talks to; the relationship is configured independently
|
|
on both sides (there is no central registry).
|
|
|
|
| Column | Type | Meaning |
|
|
| --- | --- | --- |
|
|
| `peer_id` | `serial` / `integer` PK | Local surrogate id (auto-assigned). |
|
|
| `env_id` | `TEXT` UNIQUE | The peer's dev-deploy `env_id` (the other side's `_dd_env.env_id`). |
|
|
| `label` | `TEXT` | Optional human label (e.g. `test`, `prod`). |
|
|
| `base_url` | `TEXT` | Where to reach the peer (e.g. `http://localhost:3001` or `https://tenant.example.com`). |
|
|
| `peer_secret_ciphertext` | `TEXT` | Sealed shared secret (hex). |
|
|
| `peer_secret_iv` | `TEXT` | AES-GCM IV (hex). |
|
|
| `peer_secret_tag` | `TEXT` | AES-GCM auth tag (hex). |
|
|
| `require_tls` | `INTEGER` | TLS-required flag (stored as 0/1). |
|
|
| `created_at` | `TEXT` | ISO 8601 creation time. |
|
|
| `last_seen_at` | `TEXT` | ISO 8601 of the last verified inbound request from this peer; `null` until first contact. |
|
|
|
|
### The sealed shared secret
|
|
|
|
The shared secret is 32 random bytes (`randomSecret()`, `lib/crypto.js`). It
|
|
is never stored in plaintext. At rest it is sealed with AES-256-GCM
|
|
(`seal()`, `lib/crypto.js`) and split across the three `peer_secret_*`
|
|
columns as hex. The hex-text storage is deliberate: Saltcorn's SQLite insert
|
|
layer JSON-stringifies object values, which would mangle a raw `Buffer` column
|
|
(`lib/schema.js`).
|
|
|
|
The 32-byte key-encryption key (KEK) used by `seal`/`open` is derived once per
|
|
process via HKDF-SHA256 from `SALTCORN_SESSION_SECRET` (`getKek()`,
|
|
`lib/crypto.js`; falls back to the Saltcorn `session_secret` config). Because
|
|
the KEK is tied to the session secret, rotating `SALTCORN_SESSION_SECRET`
|
|
invalidates every stored pairing -- existing ciphertexts no longer decrypt
|
|
(documented in `lib/crypto.js`).
|
|
|
|
Plaintext only crosses the process boundary at two moments:
|
|
|
|
- At pairing time, when the operator copies the secret into the other side's
|
|
pairing form.
|
|
- At HMAC sign/verify time, when `peerSecret()` (`lib/peers.js`) opens the
|
|
sealed bytes to compute or check a signature.
|
|
|
|
`rowToPeer()` (`lib/peers.js`) deliberately omits the sealed columns from the
|
|
plain accessor; callers must go through `peerSecret()` / `peerSecretByEnvId()`.
|
|
|
|
## Pairing flow
|
|
|
|
Pairing is symmetric: each side ends up with a `_dd_peers` row pointing at the
|
|
other side's `env_id` and `base_url`, and both rows seal the *same* shared
|
|
secret. One side generates the secret; the operator pastes it into the other.
|
|
|
|
Each instance's own `env_id` is shown on its Peers page (`peersView`,
|
|
`lib/routes.js`): "This instance's env_id is ... Paste this into the other
|
|
instance's peer form." The `env_id` itself is a random UUID minted once at
|
|
bootstrap (`lib/env.js`).
|
|
|
|
Steps:
|
|
|
|
1. On instance A, open `/admin/dev-deploy/peers` and submit the **Add peer**
|
|
form (`peersAdd`, `lib/routes.js`) with the peer's `env_id` (B's), an
|
|
optional `label`, B's `base_url`, and an optional `require_tls` checkbox.
|
|
Leave **Existing secret** blank.
|
|
2. `addPeer` (`lib/peers.js`) generates a fresh 32-byte secret, seals it, and
|
|
inserts the row. The plaintext secret is rendered once as 64 hex characters
|
|
on the confirmation page (`lib/routes.js`) -- "it will not be shown
|
|
again."
|
|
3. On instance B, open its own Peers page and submit **Add peer** with A's
|
|
`env_id`, A's `base_url`, and paste the 64-hex secret into the **Existing
|
|
secret** field. `peersAdd` validates it against `/^[0-9a-fA-F]{64}$/`
|
|
(`lib/routes.js`) and passes it to `addPeer` as `existingSecret`, so B
|
|
seals the identical secret rather than generating a new one.
|
|
|
|
After both rows exist, A and B share one secret and each knows the other's
|
|
`env_id` and `base_url`.
|
|
|
|
`env_id` is enforced UNIQUE, so re-adding the same peer fails with "peer with
|
|
env_id ... already exists" (`lib/peers.js`).
|
|
|
|
### Rotation and deletion
|
|
|
|
- **Rotate** (`peersRotate`, `lib/routes.js` -> `rotatePeerSecret`,
|
|
`lib/peers.js`) mints a new secret for an existing peer, re-seals it, and
|
|
shows the new value once. The operator must paste the new secret on the other
|
|
side (re-pair or rotate there) or the pairing breaks.
|
|
- **Delete** (`peersDelete`, `lib/routes.js` -> `deletePeer`,
|
|
`lib/peers.js`) removes the `_dd_peers` row *and* deletes that peer's
|
|
`_dd_anchors` rows, so a later re-pair starts syncing from the epoch again.
|
|
|
|
## The HMAC wire protocol
|
|
|
|
Every machine-API request is signed with the shared secret using HMAC-SHA256.
|
|
The outbound side is `lib/transport.js`; the inbound check is `requirePeerAuth`
|
|
(`lib/peerAuth.js`).
|
|
|
|
### Headers
|
|
|
|
| Header | Source | Meaning |
|
|
| --- | --- | --- |
|
|
| `X-DD-Env-Id` | sender's own `env_id` | Caller identity; the receiver looks it up in `_dd_peers` via `findPeerByEnvId` to find the matching secret. |
|
|
| `X-DD-Timestamp` | `String(Date.now())` | Milliseconds since epoch. |
|
|
| `X-DD-Nonce` | `randomNonce().toString("hex")` | 16 random bytes, hex (replay padding). |
|
|
| `X-DD-Signature` | `sign(secret, canonical)` | Hex HMAC-SHA256 over the canonical string. |
|
|
|
|
All four headers are required; a missing one returns `400 missing header ...`
|
|
(`lib/peerAuth.js`, `lib/peerAuth.js`).
|
|
|
|
When there is a request body, the sender sets
|
|
`Content-Type: application/vnd.dev-deploy+json` (`lib/transport.js`). This
|
|
custom type stops Saltcorn's `express.json()` middleware from consuming the
|
|
request stream, so the receiver can read the exact raw bytes and HMAC them
|
|
verbatim -- no re-serialization, no whitespace or key-order assumptions
|
|
(`lib/peerAuth.js`, `lib/peerAuth.js`).
|
|
|
|
### The canonical string
|
|
|
|
Both sides build the signed string with `buildCanonical` (`lib/crypto.js`).
|
|
It is six fields joined by newlines (`\n`):
|
|
|
|
```
|
|
timestamp
|
|
nonce
|
|
METHOD
|
|
path
|
|
targetHost
|
|
sha256hex(body)
|
|
```
|
|
|
|
- `METHOD` is uppercased.
|
|
- `path` is the request path including query string. Outbound it is the literal
|
|
`path` argument; inbound it is `req.originalUrl || req.url`
|
|
(`lib/peerAuth.js`).
|
|
- `body` is hashed with SHA-256 (`sha256Hex`, `lib/crypto.js`); an empty body
|
|
hashes the empty string. GET/HEAD never have a body
|
|
(`lib/peerAuth.js`).
|
|
|
|
### Host binding (anti-cross-tenant replay)
|
|
|
|
`targetHost` is the normalized host the request is aimed at, and binding it into
|
|
the signature is what stops a request signed for one tenant from being replayed
|
|
against another tenant on the same multi-tenant server.
|
|
|
|
- Outbound, the host is derived from the peer's `base_url`:
|
|
`normalizeHost(new URL(baseUrl).host)` (`lib/transport.js`).
|
|
- Inbound, it is derived from the request: prefer `X-Forwarded-Host` (first
|
|
value, set by a trusted proxy), else the `Host` header, then normalized the
|
|
same way (`lib/peerAuth.js` to `lib/peerAuth.js`).
|
|
|
|
`normalizeHost` (`lib/crypto.js`) lowercases, trims, and drops a trailing
|
|
`:80` or `:443` so both sides produce byte-identical strings (clients omit the
|
|
default port from the `Host` header). Because the canonical includes
|
|
`targetHost`, a signature computed for `t1.example.com` will not verify when the
|
|
same bytes are re-sent to `t2.example.com`: the receiver rebuilds the canonical
|
|
with its own host, the MAC differs, and verification fails with
|
|
`401 bad signature`.
|
|
|
|
Note (`lib/peerAuth.js`): the receiver derives the host from the request, NOT
|
|
from `peerRow.base_url`. Inbound, `base_url` is the *sender's* address (used for
|
|
pull-back), not the receiver's own host.
|
|
|
|
### Verification order
|
|
|
|
`requirePeerAuth` (`lib/peerAuth.js`) checks, in order, and returns `null`
|
|
(after sending a 4xx) on the first failure:
|
|
|
|
1. All four required headers present, else `400`.
|
|
2. Timestamp within the +/- 5 minute skew window
|
|
(`timestampWithinSkew`, `lib/crypto.js`; `SKEW_TOLERANCE_MS = 5 * 60 *
|
|
1000`, `lib/crypto.js`), else `401 timestamp out of skew window`.
|
|
3. `X-DD-Env-Id` resolves to a `_dd_peers` row, else
|
|
`401 unknown peer env_id`.
|
|
4. The peer has a sealed secret that opens, else `401 peer not provisioned`.
|
|
5. Signature matches via constant-time compare (`verifySignature`,
|
|
`lib/crypto.js`, uses `crypto.timingSafeEqual`), else
|
|
`401 bad signature`.
|
|
6. If there was a body, it parses as JSON (after the signature already covered
|
|
the raw bytes), else `400 body is not valid JSON`.
|
|
|
|
On success it parses the body into `req.body`, advances the peer's
|
|
`last_seen_at` (`touchPeerLastSeen`, `lib/peers.js`), sets `req.dd_peer` to
|
|
the peer row, and returns it.
|
|
|
|
The nonce is sent and signed but the current code does not maintain a
|
|
server-side seen-nonce cache; replay protection rests on the skew window and the
|
|
host binding. (Stated to avoid over-claiming; no nonce store exists in the code
|
|
read.)
|
|
|
|
## Promote, pull, and anchors
|
|
|
|
Sync direction is per peer and per direction, tracked in `_dd_anchors`
|
|
(`lib/schema.js`):
|
|
|
|
| Column | Meaning |
|
|
| --- | --- |
|
|
| `peer_id` | FK-by-convention to `_dd_peers.peer_id` (PK part). |
|
|
| `direction` | `outbound` or `inbound` (PK part). |
|
|
| `last_op_id` | The last op id synced in that direction for that peer. |
|
|
| `updated_at` | ISO 8601 of the last advance. |
|
|
|
|
`PRIMARY KEY (peer_id, direction)` means at most one outbound and one inbound
|
|
watermark per peer.
|
|
|
|
Both promote and pull select only ops authored by the *local* env
|
|
(`source_env_id = env.env_id`) and only those `created_at >` the anchor op's
|
|
`created_at`. If there is no anchor, sync starts from the epoch (the whole
|
|
journal). Helpers: `getOutboundAnchor` / `getInboundAnchor` / `upsertAnchor`
|
|
(`lib/routes.js` to `lib/routes.js`).
|
|
|
|
### Promote (push ops to a peer)
|
|
|
|
`promote` (`lib/routes.js`):
|
|
|
|
1. Look up the peer and the local env; read the outbound anchor.
|
|
2. Select the local env's ops after the anchor, oldest first, `LIMIT 500`
|
|
(`lib/routes.js`). If none, redirect with "no ops to promote".
|
|
3. `signedFetch` `POST /dev-deploy/api/ingest` with `{ ops }` and the peer's
|
|
secret (`lib/routes.js`).
|
|
4. On success, advance the outbound anchor to the last op's `op_id`
|
|
(`upsertAnchor(peerId, "outbound", ...)`, `lib/routes.js`).
|
|
5. Summarize applied/error counts from the response and append any plugin-
|
|
version warnings from `diffPluginsWithPeer` (`lib/routes.js`, which calls
|
|
`/dev-deploy/api/health`).
|
|
|
|
`planView` (`lib/routes.js`) is the dry run: same anchor-relative selection
|
|
(`LIMIT 500`) but rendered as a preview table instead of being sent.
|
|
|
|
The receiving side, `apiIngest` (`lib/routes.js`), authenticates, applies
|
|
the batch with `applyBatch`, and advances *its* `inbound` anchor for the sender
|
|
to the last received `op_id` (`lib/routes.js`).
|
|
|
|
### Pull (fetch a peer's ops)
|
|
|
|
`pull` (`lib/routes.js`):
|
|
|
|
1. Read the inbound anchor; build the path
|
|
`/dev-deploy/api/journal?since=<last_op_id>` (or no `since` if no anchor)
|
|
(`lib/routes.js`).
|
|
2. `signedFetch` `GET` that path (`lib/routes.js`).
|
|
3. Apply the returned `ops` with `applyBatch` (`lib/routes.js`).
|
|
4. Advance the inbound anchor to the last pulled op's `op_id`
|
|
(`lib/routes.js`).
|
|
5. Summarize applied/error/conflict counts and plugin warnings.
|
|
|
|
The serving side, `apiJournal` (`lib/routes.js`), returns the local env's
|
|
ops after `since` (resolved to the op's `created_at`), oldest first, `LIMIT
|
|
1000` (`lib/routes.js`), as `{ source_env_id, ops }`.
|
|
|
|
## Mixed-topology peering
|
|
|
|
A standalone instance and a specific tenant on a multi-tenant server peer the
|
|
same way as two standalone instances; the only difference is the `base_url`.
|
|
|
|
- Address the tenant by its tenant hostname as `base_url`, e.g.
|
|
`https://tenant.example.com`. Saltcorn routes the request to that tenant by
|
|
host, and dev-deploy's tables are schema-qualified per tenant
|
|
(`db.getTenantSchemaPrefix()` is used throughout, e.g. `lib/routes.js`),
|
|
so the peer row, ops, and anchors all live in that tenant's schema.
|
|
- The host binding makes this safe: the signature is computed over the tenant
|
|
hostname (outbound from `base_url`; inbound from `X-Forwarded-Host` / `Host`).
|
|
A request signed for one tenant cannot be replayed against another tenant on
|
|
the same server, because each tenant's host produces a different canonical
|
|
string (see [Host binding](#host-binding-anti-cross-tenant-replay)).
|
|
- Each side still stores the other's `env_id` and `base_url` in its own
|
|
`_dd_peers`. A standalone instance points `base_url` at the tenant's hostname;
|
|
the tenant points `base_url` back at the standalone instance's hostname.
|
|
|
|
If a reverse proxy fronts the tenants, it must set `X-Forwarded-Host` to the
|
|
tenant hostname so the inbound canonical matches the outbound one
|
|
(`lib/peerAuth.js`).
|
|
|
|
## Endpoint reference
|
|
|
|
All four machine-API routes are registered with `noCsrf: true`
|
|
(`lib/routes.js` to `lib/routes.js`) and require HMAC peer auth via
|
|
`requirePeerAuth`. The admin peer/sync routes require a session with admin role
|
|
(`role_id === 1`, `isAdmin`, `lib/routes.js`) and use CSRF fields.
|
|
|
|
### Machine API (HMAC peer auth)
|
|
|
|
| Method | Path | Handler | File:line | Purpose |
|
|
| --- | --- | --- | --- | --- |
|
|
| GET | `/dev-deploy/api/journal?since=op_id` | `apiJournal` | `lib/routes.js` | Return local env ops after `since`, oldest first, max 1000. Returns `{ source_env_id, ops }`. |
|
|
| POST | `/dev-deploy/api/ingest` | `apiIngest` | `lib/routes.js` | Apply `{ ops }` from a peer; advance that peer's inbound anchor. Returns `{ received, results }`. |
|
|
| GET | `/dev-deploy/api/file/:uuid` | `apiFile` | `lib/routes.js` | Stream a file entity's bytes by UUID (octet-stream). 404 if no `_dd_entity_ids` mapping for kind `file`. |
|
|
| GET | `/dev-deploy/api/health` | `apiHealth` | `lib/routes.js` | Return `{ env_id, label, plugins }` for plugin-drift checks. |
|
|
|
|
### Admin peer and sync routes (session + admin role)
|
|
|
|
| Method | Path | Handler | File:line | Purpose |
|
|
| --- | --- | --- | --- | --- |
|
|
| GET | `/admin/dev-deploy/peers` | `peersView` | `lib/routes.js` | List peers, show this env's `env_id`, add-peer form. |
|
|
| POST | `/admin/dev-deploy/peers/add` | `peersAdd` | `lib/routes.js` | Pair a peer; generate or accept a 64-hex secret. |
|
|
| POST | `/admin/dev-deploy/peers/rotate` | `peersRotate` | `lib/routes.js` | Rotate a peer's shared secret (shown once). |
|
|
| POST | `/admin/dev-deploy/peers/delete` | `peersDelete` | `lib/routes.js` | Delete a peer and its anchors. |
|
|
| GET | `/admin/dev-deploy/plan` | `planView` | `lib/routes.js` | Preview ops that would be promoted to a peer. |
|
|
| POST | `/admin/dev-deploy/promote` | `promote` | `lib/routes.js` | Push outbound ops to a peer via signed `ingest`. |
|
|
| POST | `/admin/dev-deploy/pull` | `pull` | `lib/routes.js` | Pull a peer's ops via signed `journal` and apply them. |
|
|
|
|
## File reference
|
|
|
|
| File | Responsibility |
|
|
| --- | --- |
|
|
| `lib/peers.js` | `_dd_peers` CRUD; seal/open the shared secret; `peerSecret`, `addPeer`, `rotatePeerSecret`, `deletePeer`, `touchPeerLastSeen`. |
|
|
| `lib/crypto.js` | AES-256-GCM seal/open, HKDF KEK, HMAC sign/verify, `buildCanonical`, `normalizeHost`, skew check, random secret/nonce. |
|
|
| `lib/transport.js` | Outbound signed requests: `signedFetch` (JSON) and `signedFetchBinary` (raw bytes). |
|
|
| `lib/peerAuth.js` | Inbound `requirePeerAuth`: header check, skew, peer lookup, raw-body HMAC verify, host binding. |
|
|
| `lib/routes.js` | Admin UI for pairing/plan/promote/pull and the four machine-API handlers. |
|
|
| `lib/schema.js` | `_dd_peers` (`:38`) and `_dd_anchors` (`:116`) table definitions. |
|