Version: v0.1.8

Custom systemd Units

Honeyframe services run as systemd units. The standard install renders them from templates shipped in the systemd/ directory of the source tree, substituting placeholders for paths and the customer display name. This page covers the unit anatomy, where to drop overrides, and the operational guarantees the units provide.

Core services

Unit	Port	What it runs
`hub-platform.service`	8001	Platform API (`paas/backend/main.py`)
`hub-cloud.service`	8002	Cloud API (`iaas/backend/main.py`)

The unit filenames retain the legacy hub-* prefix even though the brand is Honeyframe — renaming would break running deployments. The Description= line carries the customer-visible "Honeyframe Platform / Cloud" label. Vertical SaaS units (e.g. tenant-specific service APIs) follow the same pattern but are not part of the core install.

Unit anatomy

The Platform unit is the canonical example. The rendered file installed at /etc/systemd/system/hub-platform.service looks like:

[Unit]
Description=Honeyframe Platform API — Acme Corp
After=network.target
StartLimitIntervalSec=300
StartLimitBurst=5

[Service]
Type=simple
User=root
WorkingDirectory=/opt/honeyframe/paas/backend
EnvironmentFile=/opt/honeyframe/paas/backend/.env
Environment=DBT_PROJECT_ROOT=/opt/honeyframe/paas/dbt
Environment=DBT_PROFILES_DIR=/data/honeyframe/.dbt
Environment=DATA_DIR=/data/honeyframe
Environment=INSTALL_DIR=/opt/honeyframe
Environment=PYTHONPATH=/opt/honeyframe/paas/backend/plugins
ExecStartPre=-/bin/sh -c "fuser -k 8001/tcp 2>/dev/null || true"
ExecStartPre=/bin/sleep 2
ExecStart=/usr/bin/python3 -m uvicorn main:app --host 0.0.0.0 --port 8001
Restart=on-failure
RestartSec=10
MemoryMax=1500M

[Install]
WantedBy=multi-user.target

Why these directives

StartLimitIntervalSec=300 / StartLimitBurst=5 — caps systemd at 5 restarts per 5 minutes before giving up. Without this, a crashlooping service can churn forever and mask the underlying problem. Note: these go in [Unit], not [Service] — systemd silently ignores them if misplaced.
User=root — the standard install runs as root because nginx, certbot, and the Python services share log paths under /var/log/ and the data directory is owned by root. To run as an unprivileged user, see Running as a non-root user below.
PYTHONPATH=$INSTALL_DIR/paas/backend/plugins — optional plugins (chromadb, faiss, cloud connectors) install into plugins/ if you run setup-customer.sh --install-plugins. The path is safe even when the directory doesn't exist.
ExecStartPre=fuser -k 8001/tcp — kills any orphaned process holding the port before starting. Defensive against partial-restart states. The - prefix means "ignore failure" (the port may be free).
Restart=on-failure / RestartSec=10 — systemd restarts the service when it exits non-zero, with a 10-second cooldown.
MemoryMax=1500M — kernel-enforced memory cap. The service is OOM-killed if it exceeds this; combined with Restart=on-failure, that translates a memory leak into a 10-second blip rather than a host-wide swap death spiral.

Customizing without forking

Use systemd's drop-in mechanism rather than editing the unit file directly. Create:

mkdir -p /etc/systemd/system/hub-platform.service.d/
$EDITOR /etc/systemd/system/hub-platform.service.d/local.conf

Example overrides:

# /etc/systemd/system/hub-platform.service.d/local.conf

# Tighter memory cap for a small VPS
[Service]
MemoryMax=900M

# Run as a dedicated user (see "Running as a non-root user")
User=honeyframe
Group=honeyframe

# Add a private temp dir
PrivateTmp=true

Drop-ins are merged on top of the base unit. Reload after editing:

systemctl daemon-reload
systemctl restart hub-platform

Existing drop-ins shipped by the install:

/etc/systemd/system/hub-platform.service.d/er-limits.conf — entity-resolution memory tuning. Untouched in most installs.

Running as a non-root user

To run the services as honeyframe:

useradd -r -s /usr/sbin/nologin honeyframe
chown -R honeyframe:honeyframe /opt/honeyframe /data/honeyframe
Drop in User=honeyframe, Group=honeyframe per the example above.
Drop the fuser -k <port>/tcp ExecStartPre (the unprivileged user can't kill arbitrary processes). Replace with a TimeoutStopSec=15 and KillMode=mixed to give graceful shutdown a window.
Reload and restart.

The User=root default is convenient, not required. The Python code does not chown files at runtime.

Operational commands

# Status
systemctl status hub-platform --no-pager

# Logs (last 50 lines + follow)
journalctl -u hub-platform -n 50 -f

# Restart all core services at once
systemctl restart hub-platform hub-cloud

# Health check across services and ports
for s in hub-platform hub-cloud; do
  printf "%-15s %s\n" "$s" "$(systemctl is-active $s)"
done
for p in 8001 8002; do
  printf ":%s → %s\n" "$p" "$(curl -s -o /dev/null -w "%{http_code}" -m 3 http://localhost:$p/api/health)"
done

Companion units

Unit	Purpose
`dbt-run.service` + `dbt-run.timer`	Nightly dbt rebuild across all tenants
`hub-scheduler.service`	Pipeline scheduler — picks up cron-triggered jobs
`hub-oom-watchdog.service` + `hub-oom-watchdog.timer`	OOM/memory watchdog for the `hub-*` units (v0.1.0)

These are independent of the three API services and can be enabled or disabled per deployment.

OOM watchdog

Restart=on-failure has a blind spot: a single cgroup OOM-kill restarts the unit without ever reaching the failed state, so a bare OnFailure= handler never fires and a silent OOM-restart loop goes unnoticed. v0.1.0 adds a watchdog to catch this.

hub-oom-watchdog.timer runs oom_watchdog.py as a oneshot every 2 minutes. The watchdog itself is capped at MemoryMax=128M so it can never be the thing that OOMs.
The script is stdlib-only with no app imports, so it works even when the platform is down. It does a reactive journal-marker scan (for past OOM-kills) plus a proactive MemoryCurrent/MemoryMax threshold check.
On a hit it journals a HONEYFRAME-OOM-ALERT marker (for SIEM ingestion) and sends a best-effort SMTP alert, throttled via a JSON state file.
hub-scheduler.service also carries OnFailure=hub-oom-watchdog.service as a belt-and-suspenders trigger for hard failures (e.g. StartLimitBurst exhausted).

Maintenance / upgrade page

From v0.1.7, nginx serves a branded maintenance page (nginx/maintenance.html) instead of a broken app or a failed SSO redirect when the backend is unreachable. It is wired two ways across all tiers:

A /data/honeyframe/MAINTENANCE flag file (covers a whole planned upgrade window).
An error_page 502/503/504 fallback (covers unplanned outages).

honeyframe update raises the flag after signature-verify and clears it after the restart (the boot gap is bridged by the 502 fallback, so the page never flickers off early); honeyframe rollback clears it too. For planned windows there is a dedicated subcommand:

honeyframe maintenance on        # raise the flag
honeyframe maintenance off       # clear it
honeyframe maintenance status    # check current state

Core services​

Unit anatomy​

Why these directives​

Customizing without forking​

Running as a non-root user​

Operational commands​

Companion units​

OOM watchdog​

Maintenance / upgrade page​