v0.0.45 — Enterprise SSO, per-org outbound, upgrade self-heal
Released: 2026-05-04. The first release auto-cut by the CircleCI docs job.
Three big themes for tenant admins (the SSO trifecta, per-org outbound surfaces, logical backup/restore), a long-overdue audit-log overhaul, and a substantial round of upgrade-machinery hardening so future deploys self-heal column drift and don't trip on SELinux or single-tier installs.
Enterprise SSO trifecta
Three new sign-in paths land alongside the existing Google OAuth, all admin-configured per-org with no operator code changes:
- Microsoft (Azure AD / Entra ID) — works with personal MS accounts, work/school accounts, single-tenant directories, multi-tenant directories. Operator sets
MICROSOFT_CLIENT_ID+MICROSOFT_TENANT(common/ specific GUID /organizations/consumers); admin enables the button. Issuer check accepts both v1 (sts.windows.net/<tid>/) and v2 (login.microsoftonline.com/<tid>/v2.0) formats. JWKS cached 24h with cache-bust + retry on unknown kid. - Generic OIDC — paste an issuer URL + client_id; the platform follows the OIDC discovery spec for the rest. Works with Okta, Auth0, Keycloak, Ping, Cloudflare Access, anything OIDC-compliant. Per-(org_id, slug) addressing with a globally-unique slug, so the public sign-in URL doesn't carry org_id. Bait-endpoint defense: a discovery doc whose
issuerfield doesn't match what was configured fails fast — the platform never silently trusts whatever URL serves the discovery JSON. - SAML 2.0 — for ADFS / Azure AD SAML / Okta SAML / OneLogin / the long tail of old-line enterprise IdPs. Uses
signxmlfor signature verification (canonicalization + XML Signature Wrapping defense). Cert paste handles both full PEM blocks AND bare base64 — the most common operator-paste mistake. Email resolution walks the common attribute aliases (xmlsoap claim, urn:oid:0.9.2342..., plainemail, NameID fallback when email-shaped).
All four SSO paths share identical conventions: auto-provision new users with role=viewer, bypass must_reset_password (SSO is the identity-of-record), and emit <provider>_login audit rows. Domain-allowlist enforcement is consistent: empty allowlist means "any verified account"; populated allowlist is enforced before the JWT mints.
Per-org outbound — SMTP + webhooks
Two new tenant-controllable outbound paths, both admin-only and org-scoped at every step:
- Per-org SMTP override (
/smtp) — Cloud-tier tenants and Enterprise operators plug in their own SMTP host + credentials so invite + password-reset emails arrive from<team>@<their-domain>instead of the platform default. Better deliverability (recipient SPF/DKIM checks pass against the tenant's own DNS), better branding, no shared-relay quota. Reuses the LLM-key encryption derivation for the password — operators rotate once, both surfaces follow. TheTest sendbutton surfaces the SMTP error verbatim (502 + detail) so admins fix host/auth from the response, not the journal. Forgot-password resolves the user's primaryorg_idfromuser_orgsso reset emails route through the tenant relay even on the unauthenticated request. - Per-org webhooks (
/webhooks) — generic outbound event delivery. HMAC-signed payloads (sha256=<hex hmac>, same shape Stripe / GitHub use), async delivery, retry with exponential backoff (1m / 5m / 15m / 1h, then dropped), per-event-type subscription, and a delivery log with response code + excerpt for debugging. Empty events array means "all events" (Stripe convention). The dispatcher is best-effort — DB or transport failure never raises into the calling business path, so an audit-log write must succeed even when a tenant's relay is down. Secret echoed once on create / rotation; subsequent GETs only carry a 6-character preview tail.
Per-org logical backup + restore
Tenant-driven backup of the org's own data (/backups). An admin can snapshot before a risky migration / mass delete / experiment, AND restore without filing a ticket.
- Coverage. Every
honeyframe.*table that carries anorg_idcolumn, plus per-tenantt<pid>_uploads.*tables. Schema discovery walksINFORMATION_SCHEMAon every backup so future schema migrations are auto-included. A deny-list keepsaudit_log/webhook_deliveries/health_probe_history/password_reset_tokens/mutation_log/pipeline_runs/job_runsOUT — restoring those would re-emit historical events (compliance hazard). - Storage.
{DATA_DIR}/_backups/<org_slug>/<YYYYMMDD-HHMMSS>.tar.gz. The underscore prefix keeps backups out of the per-tenant tree the disk-usage counters mirror, so a backup of org X doesn't double-count against X's storage quota. Auto-prune keeps 20 backups per org. - Restore. Single transaction: TRUNCATE-with-WHERE the org's rows, re-INSERT from CSV, drop+recreate per-tenant uploads schemas.
format_versionon the manifest fail-fasts on tarballs from a future schema layout — no silent partial restore. The endpoint demandsRESTORE <slug>typed verbatim (GitHub-style guard), enforced server-side. Theorg_idon the manifest belt-and-braces against a mis-uploaded tarball silently restoring one tenant's data into another.
Audit log overhaul
Compliance + post-incident investigation gets the tools it needs.
- Search (
q=…) — substring acrossaction / resource_type / resource_id / details::text / ip_address / user.full_name / user.username. JSONBdetailsis cast to text so the search hits structured payloads too. UI debounces 350ms. - Date range —
from_date/to_date(ISO 8601). The to-date is extended to23:59:59before going on the wire, so "today" actually includes today. - CSV export — fixed column order (locked by integration test — operator scripts grep on the header). Same filters as the list endpoint. Defensive 100k-row ceiling with a "WARNING: export was truncated" footer in the file rather than silent truncation.
X-Audit-Export-Row-Countheader for verification.
Branding upload + /health 24h trend
- Logo upload. Settings now has Upload + Clear next to the existing free-text Logo URL field. Stored at
{DATA_DIR}/_assets/branding/<org_slug>/logo.<ext>, served via the existing nginx/static-assets/alias. 2 MiB hard cap; allowlisted formats: PNG / JPG / SVG / WebP / GIF. URL carries?v=<unix_ts>so cached versions surrender immediately on rotation. - /health 24h trend chart. The v0.0.44
/healthpage showed a live snapshot but no history. v0.0.45 adds a persistedhealth_probe_historytable + a horizontal status band per component below the live cards, with a range selector (1h / 6h / 24h / 3d / 7d). Persistence is dedupe-aware: insert whenoverall_statusflips OR every 5 minutes for stable systems. ~12 rows/hr steady-state. The history endpoint is supplementary — never blanks the page if missing.
Upgrade machinery hardening
Three install-time changes baked in so v0.0.45+ deploys to ANY customer (single-tier, multi-tier, Ubuntu, RHEL/Rocky/CentOS) work without manual recovery:
schema_sync.py— auto-heal column drift on every upgrade. Parses everyCREATE TABLEininit_schema.sqlandALTERs in any column that's missing from the live DB. Pure additive: never DROPs, never alters existing column types, never touches data.cmd_updateruns it after vendor install, beforemigrate.py+ service restart. Going forward: everyhoneyframe update <tarball>self-heals additive column drift; operators no longer chase 500s for missing-column class regressions.- SELinux auto-relabel.
cmd_install+cmd_updatenow runrestorecon -R INSTALL_DIRafter copy. Closes the Rocky Linux 9.4 class of failure where tarball extraction inherits/tmp'suser_tmp_tlabel and systemd'sExecStartrejects exec on it. Detects SELinux state viagetenforce; no-op on Ubuntu/Debian and on systems withoutrestorecon. - Cross-tier migration WARN.
migrate.pynow classifies "schema X does not exist" / "relation X does not exist" exceptions as WARN with "(likely tier-not-applicable, install OK)" and returns success. Real syntax errors and constraint violations still FAIL. Closes the class where a healthy single-tier install reportedDone: 3 applied, 1 skipped, 4 failed.
Plus a focused fix for the long-tail class of issue where per-tenant dbt workspaces had pre-v0.0.29 schema refs (dataintel., hubstudio.) and pre-v0.0.41 install-path refs (/opt/hubstudio-data-intel/...) leaking into runtime. Two repeatable, idempotent migrations close the long tail; both ship with --dry-run. Compat-symlink alternative was explicitly rejected.
Migrations
Five new idempotent SQL migrations land in this release (all IF NOT EXISTS):
2026-05-04_add_org_oidc_clients.sql2026-05-04_add_org_saml_clients.sql2026-05-04_add_org_smtp_settings.sql2026-05-04_add_health_probe_history.sql- Plus
webhook_endpoints/webhook_deliveries/org_backups-related tables ininit_schema.sql
After v0.0.45, schema_sync.py --apply self-heals additive column drift on every honeyframe update, so manual migration runs are needed only for table additions, type changes, drops, renames, or backfills.
Connector secret backfill (v0.0.44 leftover) is now mandatory. Run paas/scripts/migrations/2026-05-04_encrypt_connector_secrets.py once per tenant if you didn't run it during the v0.0.44 cycle.
What's not in this release
- SCIM provisioning — SSO covers sign-in, not user provisioning.
- Webhooks WAL replay — the delivery log captures every attempt; there's no "replay all failed in window X" admin button yet.
- Backup encryption-at-rest — tarballs live in
DATA_DIR/_backups/with normal filesystem perms. - Audit log retention policy — search + export are user-facing; automated retention/purge is deferred.