sc-idp/docs/operations.md
2026-06-01 16:40:54 -05:00

266 lines
12 KiB
Markdown

# Operations & Deployment
How to run, install, and operate the `saltcorn-idp` plugin in the local
development setup. For protocol/internal details see the sibling docs:
[OIDC](./oidc.md), [LDAP](./ldap.md), [SAML](./saml.md),
[architecture](./architecture.md), [configuration](./configuration.md).
All commands below assume the project root:
```
/home/scott/claude/saltcorn
```
The upstream Saltcorn checkout lives in `saltcorn/` under that root; the plugin
source lives in `idp/`. Each `env.sh` prepends
`saltcorn/packages/saltcorn-cli/bin` to `PATH` so `saltcorn ...` resolves to the
in-tree CLI.
## The three dev instances
There are three parallel Saltcorn dev instances, each with its own start script,
state directory, and `env.sh`. They are intentionally isolated (distinct ports,
session stores, session secrets) so they can run at the same time.
| Instance | Script | HTTP port | Backend | State dir | LDAPS port |
|----------|--------|-----------|---------|-----------|------------|
| MAIN | `startServer.sh` | 3000 (default) | SQLite | `.dev-state/` | 1636 |
| TEST | `startServerTest.sh` | 3001 (`SALTCORN_PORT`) | SQLite | `.dev-state-test/` | none |
| PG | `startServerPg.sh` | 3002 (`-p 3002`) | Postgres, multi-tenant | `.dev-state-pg/` | 1637 |
Each start script `cd`s into its state directory before running
`saltcorn serve`. This is deliberate: on SQLite, Saltcorn's session store writes
`sessions.sqlite` at the process current working directory
(`packages/server/routes/utils.js`, the `db.isSQLite` branch), so running from
inside the state dir gives each SQLite instance its own `sessions.sqlite`
alongside its `saltcorn.sqlite`. (PG uses a Postgres-backed session store; see
below.)
### What each env.sh sets
Common to all three: `NVM_DIR` (sources nvm to pick up Node), and the `PATH`
prepend for the in-tree CLI.
**MAIN -- `.dev-state/env.sh`**
| Variable | Value | Purpose |
|----------|-------|---------|
| `SQLITE_FILEPATH` | `.dev-state/saltcorn.sqlite` | Forces the SQLite backend; DB file location |
| `SALTCORN_FILE_STORE` | `.dev-state/files` | Uploaded-file directory |
| `SALTCORN_SESSION_SECRET` | `32552b95...410151` | Session/cookie key; also the IdP KEK + oidc cookie-key source |
| `SALTCORN_IDP_LDAP_PORT` | `1636` | Enables the LDAPS listener on 1636. Only MAIN sets this |
**TEST -- `.dev-state-test/env.sh`**
| Variable | Value | Purpose |
|----------|-------|---------|
| `SQLITE_FILEPATH` | `.dev-state-test/saltcorn.sqlite` | TEST's own SQLite DB |
| `SALTCORN_FILE_STORE` | `.dev-state-test/files` | TEST's own file store |
| `SALTCORN_SESSION_SECRET` | `cde4d5ce...86265` | Different secret from MAIN |
| `SALTCORN_PORT` | `3001` | HTTP listen port (passed via `saltcorn serve -p "$SALTCORN_PORT"`) |
TEST does **not** set `SALTCORN_IDP_LDAP_PORT`, so it runs no LDAP listener.
**PG -- `.dev-state-pg/env.sh`**
| Variable | Value | Purpose |
|----------|-------|---------|
| `PGHOST` | `/var/run/postgresql` | Unix socket; peer auth (no real password) |
| `PGUSER` | `scott` | OS user matched by peer auth |
| `PGDATABASE` | `saltcorn_idp` | DB name (must already exist) |
| `PGPASSWORD` | `peer` | Dummy value; peer auth ignores it, but Saltcorn only selects Postgres when user+password+database are all set (`connect.ts getConnectObject`) |
| `SALTCORN_MULTI_TENANT` | `true` | Enables schema-per-tenant (Postgres only) |
| `SALTCORN_IDP_LDAP_PORT` | `1637` | LDAPS listener on 1637 (distinct from MAIN's 1636) |
| `SALTCORN_FILE_STORE` | `.dev-state-pg/files` | File store |
| `SALTCORN_SESSION_SECRET` | `3ca4fab8...41a` | Different secret from MAIN/TEST |
| `SALTCORN_JWT_SECRET` | `d379db38...f158f` | JWT signing secret |
PG deliberately does **not** set `SQLITE_FILEPATH`, so Saltcorn's
`getConnectObject()` selects Postgres.
Note: on Postgres the session store is Postgres-backed, not SQLite. Saltcorn
only uses the `connect-sqlite3` `sessions.sqlite` store when `db.isSQLite`
(`packages/server/routes/utils.js`); on Postgres it uses `connect-pg-simple`
with a `_sc_session` table, so no `sessions.sqlite` is written for the PG
instance.
## Starting and stopping
Start each instance from the project root:
```bash
./startServer.sh # MAIN -> http://localhost:3000 (LDAPS :1636)
./startServerTest.sh # TEST -> http://localhost:3001 (no LDAP)
./startServerPg.sh # PG -> http://localhost:3002 (LDAPS :1637)
```
On boot, MAIN and TEST run `saltcorn install-plugin -d ./dev-deploy` (the
separate metadata-migration plugin) before serving; that install is per-instance
safe because each uses its own source dir. Failures there are non-fatal -- the
previously installed version still loads. `startServerPg.sh` does not run any
install on boot.
`saltcorn-idp` is **not** installed during boot of any instance -- see the next
section for why. Each instance loads it from a prior install.
Stop an instance with Ctrl-C (the scripts `exec saltcorn serve`, so the foreground
process is the server). Stop all instances before re-installing the plugin (see
below).
## Installing the plugin
### MAIN + TEST: reinstallIdp.sh
After editing `idp/` source, reinstall into both SQLite instances with the
dedicated script, then restart the servers:
```bash
./reinstallIdp.sh
./startServer.sh &
./startServerTest.sh &
```
`reinstallIdp.sh` sources each instance's `env.sh` in a subshell and runs
`saltcorn install-plugin -d "$IDP_DIR"` once per instance, where
`IDP_DIR="$PWD/idp"`.
### Why a separate script (not in startServer*.sh)
Three reasons, all in the script's header comment and code:
1. **Absolute `-d` path required.** `saltcorn install-plugin` `path.join()`s the
`-d` argument and then `require()`s it. A leading `./` gets collapsed and is
resolved as a node module instead of a filesystem path, so it fails. The
script uses the absolute `IDP_DIR="$PWD/idp"` instead of `./idp`.
2. **EEXIST on existing node_modules symlinks.** `install-plugin` aborts with
`EEXIST` if the per-plugin-dir `node_modules` symlinks already exist (from a
prior install). `clearSymlinks()` removes them before each install so they can
be recreated cleanly:
```bash
clearSymlinks() {
find "$PLUGINS_ROOT" -maxdepth 2 -name node_modules -type l -delete 2>/dev/null || true
}
```
`PLUGINS_ROOT` is `$HOME/.local/share/saltcorn-plugins`.
3. **Shared plugins_folder race.** That `plugins_folder` is shared by both MAIN
and TEST. Doing the install in each start script would race on symlink
creation when both instances boot concurrently, so installs are centralized
here. Run it with the servers stopped.
### The additive-copy-does-not-prune gotcha
`saltcorn install-plugin` copies the `idp/` source into the shared
`plugins_folder` copy **additively**: new and modified files are copied, but
files you have **deleted** from `idp/` are **not** pruned from the copy. A stale
file can linger and still be loaded. `reinstallIdp.sh` clears the node_modules
symlinks but does not prune stale source files either. For a guaranteed clean
slate, delete the plugin's directory under
`~/.local/share/saltcorn-plugins/` before re-running the install.
### PG (multi-tenant): per-tenant install
The Postgres instance installs the plugin per tenant schema. Two steps:
1. **One-time, into the public schema** (so the shared `plugins_folder` copy
exists):
```bash
source .dev-state-pg/env.sh
saltcorn install-plugin -d ./idp
```
2. **Per tenant** (registers + enables the plugin in each tenant schema and runs
its `onLoad`):
```bash
./idp/scripts/installIdpTenant.sh t1 t2 # named tenants
./idp/scripts/installIdpTenant.sh '*' # all tenants
```
Prerequisites (from the script header): the tenants must already exist
(`saltcorn create-tenant <name>`), and the public-schema install above must have
run first.
`installIdpTenant.sh` `cd`s to the project root, sources `.dev-state-pg/env.sh`,
and runs `idp/scripts/installIdpTenant.js` with the tenant arguments.
What `installIdpTenant.js` does:
- `Plugin.loadAllPlugins()`, then resolves the target tenant list from
`process.argv` (or all tenants via `getAllTenants()` when given `*` or no
args), mapping each to its `subdomain`.
- `init_multi_tenant(Plugin.loadAllPlugins, true, tenants)` -- initializes
per-tenant State (so `getState()` resolves inside `runWithTenant`) without
running migrations; this also re-runs existing plugins' idempotent `onLoad`.
- `getRootState().setConfig("tenants_unsafe_plugins", true)` -- a root-only
config that permits installing this **local** (`-d`) plugin into tenant
schemas. In this Saltcorn build, `loadAndSaveNewPlugin` skips any non-`npm`
plugin on a non-root tenant before its `allowUnsafe` argument is consulted, so
this config is the supported lever; the CLI's
`install-plugin -t <tenant> -d <dir>` cannot do it. It is intended for a
multi-tenant deployment that offers the IdP plugin to its tenants.
- For each tenant, `installInto(tenant)` runs inside
`db.runWithTenant(tenant, ...)` within a transaction:
- `db.deleteWhere("_sc_plugins", { name: "saltcorn-idp" })` first -- a
**delete-then-insert** that removes any prior rows (including the old manual
`_sc_plugins` SQL hack and earlier installs) so it converges on exactly one
`_sc_plugins` row (one source of truth).
- Creates a `new Plugin({ name: "saltcorn-idp", source: "local", location: IDP_DIR, configuration: {} })`
and calls `Plugin.loadAndSaveNewPlugin(plugin, true, false)`.
- Verifies by checking the `_sc_plugins` row exists and that
`_idp_ldap_service` exists in the tenant's schema
(`information_schema.tables`); throws if either is missing (meaning `onLoad`
did not run).
On every subsequent boot of `startServerPg.sh`, per-tenant `onLoad` re-runs
automatically via `init_multi_tenant -> loadAllPlugins`; this is idempotent
(tables already exist, signing key already sealed, etc.), so no per-tenant
reinstall is needed on a normal restart -- only after editing the plugin source.
## Multi-tenant host routing
The PG instance is multi-tenant. The OIDC issuer is derived per request by
`issuerForReq()` (`lib/oidc/discovery.js`): it prefers the tenant's configured
`base_url` and falls back to `req.protocol + "://" + req.get("host")` when
`base_url` is unset. The SAML entity ID is that issuer plus `/saml`
(`lib/saml/idp.js`). See [architecture](./architecture.md) and [OIDC](./oidc.md).
In the dev/test setup the host conventions are:
- **Subdomain selects the tenant.** Saltcorn's multi-tenant mode uses a
subdomain offset of 1 (`packages/server/app.js`), so the leading label of a
host like `t1.localhost.localdomain:3002` selects the tenant schema `t1`.
- **Issuer comes from `base_url` (or the request host).** There is no automatic
host transform: the issuer is exactly `base_url + "/idp"` when `base_url` is
set, otherwise `<scheme>://<request-host>/idp`. Whatever value results must
match exactly what a relying party used to fetch
`/.well-known/openid-configuration`, so set `base_url` per tenant.
For LDAP, the tenant is encoded in the bind/search DN as an extra
`dc=<tenant>` component immediately before the base DN, e.g.
`uid=admin@t1.local,ou=people,dc=t1,dc=saltcorn,dc=local`. Single-tenant
(SQLite) uses the base DN with no tenant component. See [LDAP](./ldap.md).
## Known issues
### Intermittent PG LDAP bind flake on :1637
On the Postgres multi-tenant instance, the cluster primary process occasionally
does **not** bind the LDAPS listener on `:1637` on a fresh boot. The listener is
simply absent; LDAP authentication is unavailable for that run.
This is distinct from the handled `EADDRINUSE`/`EACCES` retry case. The LDAP
server start path (`lib/ldap/server.js`, `listenWithRetry`) already retries
transient bind failures up to `LDAP_BIND_MAX_ATTEMPTS` (5) with a linear backoff
of `LDAP_BIND_RETRY_BASE_MS` (500ms) x attempt, binding only in the cluster
primary (`isPrimary()`), and logs a loud `LDAP ... UNAVAILABLE` warning on final
failure. The flake observed here is not that path -- it is the primary not
binding `:1637` at all on a fresh boot.
Workaround: restart the PG instance; the listener comes up on the next boot.
Root cause is **unresolved**.
MAIN's LDAPS on `:1636` (SQLite, single-tenant) is not known to exhibit this.