Skip to main content
Version: v0.0.39

Honeyframe Cloud

Honeyframe Cloud is the managed tier — we run the infrastructure for you. Customers sign in at app.honeyframe.io, create a Space in seconds, and land on <slug>.app.honeyframe.io with their data, branding, and users in place. No VMs to provision, no certificates to renew, no setup-customer.sh to run.

This page documents the operator-facing surface of the Cloud tier: what's running where, how to read provisioning state, and what's automated versus invite-driven today. For the customer-facing flow ("how do I sign up?") see Deployment Tiers.

The Cloud tier is invite-driven in v0.0.39 — the public Launchpad signup is gated until Phase 4 (billing + automated wildcard cert renewal). The control plane and shared-tenant provisioner are shippable today.

Architecture

Three pieces, all inside our managed account:

ComponentWherePurpose
Launchpad UIcontrolplane/frontend/, served from app.honeyframe.io (Vite build)Customer-facing: signup, list Spaces, create new Space, view live provisioning events.
Control plane APIhub-control-plane.service (port 8004), control_plane.* schemaOwns the Space lifecycle. Authenticates customers, dispatches provisioning, streams events.
PaaS installThe existing Honeyframe install (paas/backend, paas/frontend)Hosts the actual tenant data. Shared by every Cloud Space.

The control plane shares the PaaS JWT secret and Postgres instance with the PaaS install but lives in its own systemd unit and its own schema. It does not import paas/ or saas/ — it's a peer service, not a tenant. The only shared resource is the license signing key (so per-tenant licenses are issued by the same authority).

app.honeyframe.io

┌─────────────┴───────────────┐
│ │
Launchpad UI <slug>.app.honeyframe.io
(Vite static) (PaaS frontend, branded per-org)
│ │
└─── /api/v1/* ────────┐ └─── /api/* ───┐
│ │
hub-control-plane PaaS backend
:8004 :8000
│ │
└────── Postgres ─────┘
(control_plane.* + honeyframe.*)

Spaces

A Space is the unit of tenancy. One Space = one customer org.

CREATE TABLE control_plane.spaces (
id UUID PRIMARY KEY,
slug TEXT UNIQUE NOT NULL, -- e.g. 'acme' → acme.app.honeyframe.io
display_name TEXT NOT NULL,
owner_user_id UUID NOT NULL,
region TEXT NOT NULL, -- 'ap-southeast-5' for now
tier TEXT NOT NULL, -- 'cloud' | 'enterprise' | 'self_hosted'
status TEXT NOT NULL, -- see state machine below
ecs_instance_id TEXT, -- Alibaba ECS i-xxx (enterprise only)
ecs_public_ip TEXT,
installer_version TEXT,
license_id UUID,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
ready_at TIMESTAMPTZ,
suspended_at TIMESTAMPTZ,
deleted_at TIMESTAMPTZ,
last_error TEXT
);

A Space is provisioned by one of two paths depending on tier:

PathUsed byWhat happensWall-clock
Shared multi-tenant (tier='cloud')The Cloud tierSQL inserts on the existing PaaS install: org → users → subscription → projects → nginx vhost → reload. No new VM.~30s
Per-VM via SSH (tier='enterprise')The Enterprise tier (BYOC), and our internal pre-provisioning fleetasyncssh.connect → SFTP-upload install.conf → exec setup-customer.sh --json-events → stream events live.~30 min – 4 hr

Both paths emit one row per state transition into control_plane.provisioning_events so the Space Detail page can show a live event log (GET /api/v1/spaces/{id}/events?since=<id>).

State machine

requested
↓ provisioner picks the row up (cloud or enterprise dispatch)
allocating
↓ ECS running + SSH reachable (enterprise only — cloud skips this)
provisioning
↓ installer succeeds, /api/version returns the expected version
seeding
↓ DNS / nginx / license activated
ready

Two terminal states:

  • failedlast_error carries the truncated traceback. Operator clicks Retry to re-run the provisioner.
  • suspended / deleted — flipped by admin actions (Phase 4).

A stuck-job reaper runs in the control plane's lifespan: any space stuck more than 1 hour in a non-terminal state flips to failed with last_error="reaper: stuck > 1h" so the queue doesn't deadlock on a crashed provisioner.

Provisioning a shared-tenant Space

Eleven steps, each emitting one event row:

  1. INSERT honeyframe.organizations (slug, display_name, billing_email)
  2. INSERT honeyframe.users for the owner (must_reset_password=true)
  3. INSERT honeyframe.user_orgs linking the owner to the org as admin
  4. (Owner email confirmed — the bootstrap password is delivered by the Launchpad UI, not emailed in cleartext)
  5. INSERT honeyframe.subscriptions (product='hub_platform', deployment_tier='shared') with license_tier='starter' (the constraint allows starter|professional|enterprise only — 'cloud' would fail it)
  6. INSERT honeyframe.projects (org_id, name='default')
  7. INSERT honeyframe.project_members for the owner as admin of the default project
  8. Render /etc/nginx/conf.d/space-<slug>.conf from the slug template
  9. Validate the rendered config (nginx -t)
  10. systemctl reload nginx to pick up the new vhost
  11. Issue HTTP-01 cert for <slug>.app.honeyframe.io (per-slug today; DNS-01 wildcard deferred to Phase 5)

On success the Space lands in ready and the customer can log in at <slug>.app.honeyframe.io.

The PaaS frontend reads window.location.hostname on load. If it matches <slug>.app.honeyframe.io, it calls /api/branding?slug=<slug> so the public branding endpoint overlays that org's logo + primary color from honeyframe.organizations onto the global defaults (the branded login surface).

Provisioning a per-VM Space (Enterprise / SSH)

When tier='enterprise' the dispatcher picks a different code path: provisioner.py (asyncssh-based).

  1. Allocate ECS — Alibaba SDK RunInstances against the configured warm-pool image, or pick from a pre-allocated warm pool if any.
  2. Wait for SSH — poll until asyncssh.connect succeeds (typically under 60s for warm-pool, ~90s for fresh).
  3. Render install.confrender_install_conf(space) maps spaces.install_conf JSONB onto the YAML grammar that setup-customer.sh reads (customer / tiers / database / domains / admin / features / openai).
  4. SFTP upload — to /tmp/honeyframe-install-{space_id}.conf, chmod 600 (the file contains the DB password).
  5. Exec installersudo bash {installer_path} --config {tmp_path} --json-events.
  6. Stream events_stream_events reads stdout line-by-line, json.loads each, INSERTs one row into provisioning_events. Lines that fail to parse get preserved as raw_log/info so forensics never loses output.
  7. Drain stderr — parallel task; non-JSON output (sudo banner, pip output, ssh chatter) lands in the feed as warn so the operator sees it.
  8. Wall-clock cap — 4 hours. (--compile-backend can take 3 hr on slow ECS — Nuitka link.)
  9. On success — write /etc/nginx/conf.d/space-<slug>.conf on the new VM, run nginx -t + systemctl reload nginx, mark space ready.
  10. On failure — flip to failed, store the truncated traceback in last_error, leave the VM running for forensics (operator decides whether to retry or destroy).

Host-pubkey pinning is a documented TODO (Phase 2.5 follow-up). Use a dedicated SSH key per provisioner identity and rotate it via the secrets surface.

Admin surface

/api/v1/admin/spaces is the fleet view (admin-only — control-plane has its own admin role flag). Filter by status to find stuck jobs:

curl -H "Authorization: Bearer $CP_ADMIN_TOKEN" \
'https://app.honeyframe.io/api/v1/admin/spaces?status=failed'

POST /api/v1/admin/spaces/{id}/exec runs a one-off command on a Space's VM (enterprise tier only). Every call is audited into control_plane.audit_log with the full payload and the operator's user id.

For the cloud tier, the equivalent is direct SQL on the shared PaaS Postgres. A Space's data is identifiable by org_id — there is no separate VM to SSH into.

Tenant URLs

PatternWhat it resolves to
app.honeyframe.ioThe Launchpad — signup, Spaces list, Create Space wizard
<slug>.app.honeyframe.ioA specific Cloud Space's PaaS frontend with the org's branding
controlplane.honeyframe.ioInternal admin console for the fleet (admin-only)
*.honeyframe.io (any other slug)Reserved for future per-tenant Enterprise domains

The *.honeyframe.io wildcard A record was set up by the team 2026-05-01. Cert posture today: per-slug HTTP-01 via certbot for each new tenant. The DNS-01 wildcard cutover is Phase 5; once it lands a single cert covers every tenant slug.

API reference

The control plane lives on its own port (8004), so its surface is not under /api/dashboards/* or /api/connectors/* — those are the PaaS surfaces. The control-plane endpoints are all under /api/v1/:

EndpointDescription
POST /api/v1/auth/signupCreate a control-plane user. Independent of the PaaS user surface.
POST /api/v1/auth/loginReturns a JWT scoped to the control-plane (signed with the same secret as the PaaS JWT, but the aud claim differs).
GET /api/v1/auth/meCurrent control-plane user.
POST /api/v1/spacesCreate a Space (kicks off provisioner).
GET /api/v1/spacesList spaces owned by the caller.
GET /api/v1/spaces/{id}One space (owner-scoped).
POST /api/v1/spaces/{id}/retryRe-run the provisioner on a failed space.
POST /api/v1/spaces/{id}/suspendFlip status (Phase 4 stub).
POST /api/v1/spaces/{id}/resumeFlip status (Phase 4 stub).
DELETE /api/v1/spaces/{id}Flip status (Phase 4 stub).
GET /api/v1/spaces/{id}/events?since=<id>&limit=NPaginated provisioning event feed.
GET /api/v1/admin/spacesFleet view (admin-only).
POST /api/v1/admin/spaces/{id}/execOne-off audited remote exec (enterprise tier).
GET /api/healthLiveness probe.
GET /api/versionBuild + git SHA.

Observability

Two log streams worth watching:

  • Control-plane logs — systemd journal for hub-control-plane.service on the control-plane host. JSON-formatted; provisioning state transitions are logged at INFO, retries at WARN, traceback dumps at ERROR.
  • control_plane.provisioning_events — one row per state transition per space. Operator-facing; surfaced in the Launchpad UI's Space Detail page (2s polling). Joinable to control_plane.spaces.id.

For SSH-driven provisioning, the events table also captures every parsed line of the installer's --json-events output, plus stderr drain. A failed install leaves a complete forensic trail without needing to log into the target VM.

Gotchas

  • Cloud tier wildcard cert — HTTP-01 per-slug today. Tenant fleet > ~50 will hit Let's Encrypt rate limits. Move to DNS-01 wildcard before opening public signup.
  • Branded login depends on a hostname match. Non-*.app.honeyframe.io hosts (e.g., localhost, IP, platform.example.com, acme.example.com) fall through to global branding — this is by design.
  • license_tier='cloud' will fail the constraint. The constraint allows starter|professional|enterprise. The cloud-tier provisioner sets license_tier='starter' for new Spaces; tier-up via Phase 4 billing flow.
  • The control plane shares Postgres with the PaaS. Long-running PaaS migrations or vacuums can starve the provisioner. Schedule heavy maintenance during off-peak.
  • Admins can open any Space in the admin UI (was 404 before v0.0.39). Read-only by default; mutations go through audited /admin/spaces/{id}/exec only.

Roadmap

The Cloud tier is invite-driven in v0.0.39. Public self-serve signup needs Phase 4:

PhaseStatusScope
1 — Strategy lockDone (v0.0.39)Three-tier model documented; Alibaba reselling decoupled from the Honeyframe brand.
2 — Control-plane API + SSH provisionerDone (v0.0.39)FastAPI service, asyncssh provisioner, admin fleet, e2e Playwright suite.
2.5 — Shared multi-tenant provisionerDone (v0.0.39)Cloud-tier dispatch path; <slug>.app.honeyframe.io branded login; nginx vhost-per-slug.
3 — Launchpad UI scaffoldDone (v0.0.39)React + Vite + Tailwind v4, customer-facing signup + Spaces list + Create wizard + Space Detail.
4 — Billing, suspension, automated cert renewalv0.0.40+Stripe integration, true suspend/resume/delete, DNS-01 wildcard, public signup gate removed.
5 — DNS-01 wildcard certv0.0.40+Single *.app.honeyframe.io cert via DNS-01 per renewal cycle; per-slug HTTP-01 retired.