Skip to main content
Version: v0.0.77

v0.0.45 — Enterprise SSO, per-org outbound, upgrade self-heal

Released: 2026-05-04. The first release auto-cut by the CircleCI docs job.

Three big themes for tenant admins (the SSO trifecta, per-org outbound surfaces, logical backup/restore), a long-overdue audit-log overhaul, and a substantial round of upgrade-machinery hardening so future deploys self-heal column drift and don't trip on SELinux or single-tier installs.

Enterprise SSO trifecta

Three new sign-in paths land alongside the existing Google OAuth, all admin-configured per-org with no operator code changes:

  • Microsoft (Azure AD / Entra ID) — works with personal MS accounts, work/school accounts, single-tenant directories, multi-tenant directories. Operator sets MICROSOFT_CLIENT_ID + MICROSOFT_TENANT (common / specific GUID / organizations / consumers); admin enables the button. Issuer check accepts both v1 (sts.windows.net/<tid>/) and v2 (login.microsoftonline.com/<tid>/v2.0) formats. JWKS cached 24h with cache-bust + retry on unknown kid.
  • Generic OIDC — paste an issuer URL + client_id; the platform follows the OIDC discovery spec for the rest. Works with Okta, Auth0, Keycloak, Ping, Cloudflare Access, anything OIDC-compliant. Per-(org_id, slug) addressing with a globally-unique slug, so the public sign-in URL doesn't carry org_id. Bait-endpoint defense: a discovery doc whose issuer field doesn't match what was configured fails fast — the platform never silently trusts whatever URL serves the discovery JSON.
  • SAML 2.0 — for ADFS / Azure AD SAML / Okta SAML / OneLogin / the long tail of old-line enterprise IdPs. Uses signxml for signature verification (canonicalization + XML Signature Wrapping defense). Cert paste handles both full PEM blocks AND bare base64 — the most common operator-paste mistake. Email resolution walks the common attribute aliases (xmlsoap claim, urn:oid:0.9.2342..., plain email, NameID fallback when email-shaped).

All four SSO paths share identical conventions: auto-provision new users with role=viewer, bypass must_reset_password (SSO is the identity-of-record), and emit <provider>_login audit rows. Domain-allowlist enforcement is consistent: empty allowlist means "any verified account"; populated allowlist is enforced before the JWT mints.

Per-org outbound — SMTP + webhooks

Two new tenant-controllable outbound paths, both admin-only and org-scoped at every step:

  • Per-org SMTP override (/smtp) — Cloud-tier tenants and Enterprise operators plug in their own SMTP host + credentials so invite + password-reset emails arrive from <team>@<their-domain> instead of the platform default. Better deliverability (recipient SPF/DKIM checks pass against the tenant's own DNS), better branding, no shared-relay quota. Reuses the LLM-key encryption derivation for the password — operators rotate once, both surfaces follow. The Test send button surfaces the SMTP error verbatim (502 + detail) so admins fix host/auth from the response, not the journal. Forgot-password resolves the user's primary org_id from user_orgs so reset emails route through the tenant relay even on the unauthenticated request.
  • Per-org webhooks (/webhooks) — generic outbound event delivery. HMAC-signed payloads (sha256=<hex hmac>, same shape Stripe / GitHub use), async delivery, retry with exponential backoff (1m / 5m / 15m / 1h, then dropped), per-event-type subscription, and a delivery log with response code + excerpt for debugging. Empty events array means "all events" (Stripe convention). The dispatcher is best-effort — DB or transport failure never raises into the calling business path, so an audit-log write must succeed even when a tenant's relay is down. Secret echoed once on create / rotation; subsequent GETs only carry a 6-character preview tail.

Per-org logical backup + restore

Tenant-driven backup of the org's own data (/backups). An admin can snapshot before a risky migration / mass delete / experiment, AND restore without filing a ticket.

  • Coverage. Every honeyframe.* table that carries an org_id column, plus per-tenant t<pid>_uploads.* tables. Schema discovery walks INFORMATION_SCHEMA on every backup so future schema migrations are auto-included. A deny-list keeps audit_log / webhook_deliveries / health_probe_history / password_reset_tokens / mutation_log / pipeline_runs / job_runs OUT — restoring those would re-emit historical events (compliance hazard).
  • Storage. {DATA_DIR}/_backups/<org_slug>/<YYYYMMDD-HHMMSS>.tar.gz. The underscore prefix keeps backups out of the per-tenant tree the disk-usage counters mirror, so a backup of org X doesn't double-count against X's storage quota. Auto-prune keeps 20 backups per org.
  • Restore. Single transaction: TRUNCATE-with-WHERE the org's rows, re-INSERT from CSV, drop+recreate per-tenant uploads schemas. format_version on the manifest fail-fasts on tarballs from a future schema layout — no silent partial restore. The endpoint demands RESTORE <slug> typed verbatim (GitHub-style guard), enforced server-side. The org_id on the manifest belt-and-braces against a mis-uploaded tarball silently restoring one tenant's data into another.

Audit log overhaul

Compliance + post-incident investigation gets the tools it needs.

  • Search (q=…) — substring across action / resource_type / resource_id / details::text / ip_address / user.full_name / user.username. JSONB details is cast to text so the search hits structured payloads too. UI debounces 350ms.
  • Date rangefrom_date / to_date (ISO 8601). The to-date is extended to 23:59:59 before going on the wire, so "today" actually includes today.
  • CSV export — fixed column order (locked by integration test — operator scripts grep on the header). Same filters as the list endpoint. Defensive 100k-row ceiling with a "WARNING: export was truncated" footer in the file rather than silent truncation. X-Audit-Export-Row-Count header for verification.

Branding upload + /health 24h trend

  • Logo upload. Settings now has Upload + Clear next to the existing free-text Logo URL field. Stored at {DATA_DIR}/_assets/branding/<org_slug>/logo.<ext>, served via the existing nginx /static-assets/ alias. 2 MiB hard cap; allowlisted formats: PNG / JPG / SVG / WebP / GIF. URL carries ?v=<unix_ts> so cached versions surrender immediately on rotation.
  • /health 24h trend chart. The v0.0.44 /health page showed a live snapshot but no history. v0.0.45 adds a persisted health_probe_history table + a horizontal status band per component below the live cards, with a range selector (1h / 6h / 24h / 3d / 7d). Persistence is dedupe-aware: insert when overall_status flips OR every 5 minutes for stable systems. ~12 rows/hr steady-state. The history endpoint is supplementary — never blanks the page if missing.

Upgrade machinery hardening

Three install-time changes baked in so v0.0.45+ deploys to ANY customer (single-tier, multi-tier, Ubuntu, RHEL/Rocky/CentOS) work without manual recovery:

  • schema_sync.py — auto-heal column drift on every upgrade. Parses every CREATE TABLE in init_schema.sql and ALTERs in any column that's missing from the live DB. Pure additive: never DROPs, never alters existing column types, never touches data. cmd_update runs it after vendor install, before migrate.py + service restart. Going forward: every honeyframe update <tarball> self-heals additive column drift; operators no longer chase 500s for missing-column class regressions.
  • SELinux auto-relabel. cmd_install + cmd_update now run restorecon -R INSTALL_DIR after copy. Closes the Rocky Linux 9.4 class of failure where tarball extraction inherits /tmp's user_tmp_t label and systemd's ExecStart rejects exec on it. Detects SELinux state via getenforce; no-op on Ubuntu/Debian and on systems without restorecon.
  • Cross-tier migration WARN. migrate.py now classifies "schema X does not exist" / "relation X does not exist" exceptions as WARN with "(likely tier-not-applicable, install OK)" and returns success. Real syntax errors and constraint violations still FAIL. Closes the class where a healthy single-tier install reported Done: 3 applied, 1 skipped, 4 failed.

Plus a focused fix for the long-tail class of issue where per-tenant dbt workspaces had pre-v0.0.29 schema refs (dataintel., hubstudio.) and pre-v0.0.41 install-path refs (/opt/hubstudio-data-intel/...) leaking into runtime. Two repeatable, idempotent migrations close the long tail; both ship with --dry-run. Compat-symlink alternative was explicitly rejected.

Migrations

Five new idempotent SQL migrations land in this release (all IF NOT EXISTS):

  • 2026-05-04_add_org_oidc_clients.sql
  • 2026-05-04_add_org_saml_clients.sql
  • 2026-05-04_add_org_smtp_settings.sql
  • 2026-05-04_add_health_probe_history.sql
  • Plus webhook_endpoints / webhook_deliveries / org_backups-related tables in init_schema.sql

After v0.0.45, schema_sync.py --apply self-heals additive column drift on every honeyframe update, so manual migration runs are needed only for table additions, type changes, drops, renames, or backfills.

Connector secret backfill (v0.0.44 leftover) is now mandatory. Run paas/scripts/migrations/2026-05-04_encrypt_connector_secrets.py once per tenant if you didn't run it during the v0.0.44 cycle.

What's not in this release

  • SCIM provisioning — SSO covers sign-in, not user provisioning.
  • Webhooks WAL replay — the delivery log captures every attempt; there's no "replay all failed in window X" admin button yet.
  • Backup encryption-at-rest — tarballs live in DATA_DIR/_backups/ with normal filesystem perms.
  • Audit log retention policy — search + export are user-facing; automated retention/purge is deferred.