Custom systemd Units
Honeyframe services run as systemd units. The standard install renders them from templates shipped in the systemd/ directory of the source tree, substituting placeholders for paths and the customer display name. This page covers the unit anatomy, where to drop overrides, and the operational guarantees the units provide.
Core services
| Unit | Port | What it runs |
|---|---|---|
hub-platform.service | 8001 | Platform API (paas/backend/main.py) |
hub-cloud.service | 8002 | Cloud API (iaas/backend/main.py) |
The unit filenames retain the legacy hub-* prefix even though the brand is Honeyframe — renaming would break running deployments. The Description= line carries the customer-visible "Honeyframe Platform / Cloud" label. Vertical SaaS units (e.g. tenant-specific service APIs) follow the same pattern but are not part of the core install.
Unit anatomy
The Platform unit is the canonical example. The rendered file installed at /etc/systemd/system/hub-platform.service looks like:
[Unit]
Description=Honeyframe Platform API — Acme Corp
After=network.target
StartLimitIntervalSec=300
StartLimitBurst=5
[Service]
Type=simple
User=root
WorkingDirectory=/opt/honeyframe/paas/backend
EnvironmentFile=/opt/honeyframe/paas/backend/.env
Environment=DBT_PROJECT_ROOT=/opt/honeyframe/paas/dbt
Environment=DBT_PROFILES_DIR=/data/honeyframe/.dbt
Environment=DATA_DIR=/data/honeyframe
Environment=INSTALL_DIR=/opt/honeyframe
Environment=PYTHONPATH=/opt/honeyframe/paas/backend/plugins
ExecStartPre=-/bin/sh -c "fuser -k 8001/tcp 2>/dev/null || true"
ExecStartPre=/bin/sleep 2
ExecStart=/usr/bin/python3 -m uvicorn main:app --host 0.0.0.0 --port 8001
Restart=on-failure
RestartSec=10
MemoryMax=1500M
[Install]
WantedBy=multi-user.target
Why these directives
StartLimitIntervalSec=300/StartLimitBurst=5— caps systemd at 5 restarts per 5 minutes before giving up. Without this, a crashlooping service can churn forever and mask the underlying problem. Note: these go in[Unit], not[Service]— systemd silently ignores them if misplaced.User=root— the standard install runs as root because nginx, certbot, and the Python services share log paths under/var/log/and the data directory is owned by root. To run as an unprivileged user, see Running as a non-root user below.PYTHONPATH=$INSTALL_DIR/paas/backend/plugins— optional plugins (chromadb, faiss, cloud connectors) install intoplugins/if you runsetup-customer.sh --install-plugins. The path is safe even when the directory doesn't exist.ExecStartPre=fuser -k 8001/tcp— kills any orphaned process holding the port before starting. Defensive against partial-restart states. The-prefix means "ignore failure" (the port may be free).Restart=on-failure/RestartSec=10— systemd restarts the service when it exits non-zero, with a 10-second cooldown.MemoryMax=1500M— kernel-enforced memory cap. The service is OOM-killed if it exceeds this; combined withRestart=on-failure, that translates a memory leak into a 10-second blip rather than a host-wide swap death spiral.
Customizing without forking
Use systemd's drop-in mechanism rather than editing the unit file directly. Create:
mkdir -p /etc/systemd/system/hub-platform.service.d/
$EDITOR /etc/systemd/system/hub-platform.service.d/local.conf
Example overrides:
# /etc/systemd/system/hub-platform.service.d/local.conf
# Tighter memory cap for a small VPS
[Service]
MemoryMax=900M
# Run as a dedicated user (see "Running as a non-root user")
User=honeyframe
Group=honeyframe
# Add a private temp dir
PrivateTmp=true
Drop-ins are merged on top of the base unit. Reload after editing:
systemctl daemon-reload
systemctl restart hub-platform
Existing drop-ins shipped by the install:
/etc/systemd/system/hub-platform.service.d/er-limits.conf— entity-resolution memory tuning. Untouched in most installs.
Running as a non-root user
To run the services as honeyframe:
useradd -r -s /usr/sbin/nologin honeyframechown -R honeyframe:honeyframe /opt/honeyframe /data/honeyframe- Drop in
User=honeyframe,Group=honeyframeper the example above. - Drop the
fuser -k <port>/tcpExecStartPre(the unprivileged user can't kill arbitrary processes). Replace with aTimeoutStopSec=15andKillMode=mixedto give graceful shutdown a window. - Reload and restart.
The User=root default is convenient, not required. The Python code does not chown files at runtime.
Operational commands
# Status
systemctl status hub-platform --no-pager
# Logs (last 50 lines + follow)
journalctl -u hub-platform -n 50 -f
# Restart all core services at once
systemctl restart hub-platform hub-cloud
# Health check across services and ports
for s in hub-platform hub-cloud; do
printf "%-15s %s\n" "$s" "$(systemctl is-active $s)"
done
for p in 8001 8002; do
printf ":%s → %s\n" "$p" "$(curl -s -o /dev/null -w "%{http_code}" -m 3 http://localhost:$p/api/health)"
done
Companion units
| Unit | Purpose |
|---|---|
dbt-run.service + dbt-run.timer | Nightly dbt rebuild across all tenants |
hub-scheduler.service | Pipeline scheduler — picks up cron-triggered jobs |
hub-oom-watchdog.service + hub-oom-watchdog.timer | OOM/memory watchdog for the hub-* units (v0.1.0) |
These are independent of the three API services and can be enabled or disabled per deployment.
OOM watchdog
Restart=on-failure has a blind spot: a single cgroup OOM-kill restarts the unit without ever reaching the failed state, so a bare OnFailure= handler never fires and a silent OOM-restart loop goes unnoticed. v0.1.0 adds a watchdog to catch this.
hub-oom-watchdog.timerrunsoom_watchdog.pyas a oneshot every 2 minutes. The watchdog itself is capped atMemoryMax=128Mso it can never be the thing that OOMs.- The script is stdlib-only with no app imports, so it works even when the platform is down. It does a reactive journal-marker scan (for past OOM-kills) plus a proactive
MemoryCurrent/MemoryMaxthreshold check. - On a hit it journals a
HONEYFRAME-OOM-ALERTmarker (for SIEM ingestion) and sends a best-effort SMTP alert, throttled via a JSON state file. hub-scheduler.servicealso carriesOnFailure=hub-oom-watchdog.serviceas a belt-and-suspenders trigger for hard failures (e.g.StartLimitBurstexhausted).
Maintenance / upgrade page
From v0.1.7, nginx serves a branded maintenance page (nginx/maintenance.html) instead of a broken app or a failed SSO redirect when the backend is unreachable. It is wired two ways across all tiers:
- A
/data/honeyframe/MAINTENANCEflag file (covers a whole planned upgrade window). - An
error_page 502/503/504fallback (covers unplanned outages).
honeyframe update raises the flag after signature-verify and clears it after the restart (the boot gap is bridged by the 502 fallback, so the page never flickers off early); honeyframe rollback clears it too. For planned windows there is a dedicated subcommand:
honeyframe maintenance on # raise the flag
honeyframe maintenance off # clear it
honeyframe maintenance status # check current state