Lewati ke konten utama
Versi: v0.0.68

Scheduler

The Scheduler is Honeyframe's single surface for time-driven automation: dataset refreshes, recipe builds, AI-agent runs, and email reports. It replaces three separate v0.0.37 surfaces (pipeline_schedules, report_schedules, and the per-project cron table) consolidated into one in v0.0.38.

Concepts

A schedule binds a trigger to a scenario. A scenario is a sequence of one or more steps.

TermMeaning
TriggerWhen the scenario fires. Cron expression, interval, or "on upstream success".
ScenarioThe unit of work the scheduler executes. One scenario can have many steps.
StepA single action — run a recipe, sync a dataset, send a report, run a custom Python block.
Scenario templateA reusable scenario shape (steps + parameters) that any project in the org can instantiate.
Per-project enable matrixWhich scenarios are enabled for which project. A scenario can be defined org-wide and turned on for some projects but not others.

Scenarios are configured under Project Settings → Schedules. Org-wide templates are configured under Admin → Scheduler templates.

Step types

StepWhat it does
run_recipeBuild one or more recipes via dbt. Equivalent to a Flow subgraph build.
sync_datasetTrigger a connector → dataset sync. The same action as the dataset detail page Sync now button.
run_pythonExecute a Python recipe. Standalone — does not participate in the dbt build.
run_agentInvoke an AI agent against a row stream.
send_email_reportRender a dashboard or set of dashboards as PDF/HTML and email to a recipient list. (v0.0.38 — replaces the standalone report_schedules table.)
snapshot_dashboardsSnapshot every dashboard in a project into honeyframe.dashboard_revisions, skipping dashboards already snapshotted within skip_hours. Companion to the editor's Dashboards → Version history drawer for dashboards that go untouched between edits. (v0.0.39)

Steps within a scenario run sequentially by default. Mark a step as parallel: true in the scenario config to fan out parallel branches; the scheduler waits for all parallel steps to complete before continuing to the next sequential step.

snapshot_dashboards (v0.0.39)

The dashboard editor's autosave only snapshots on operator edit (rate-limited to one per 30s). Dashboards untouched for a week have zero recovery points without scheduled checkpointing. The snapshot_dashboards step type closes that gap:

trigger:
cron: "0 2 * * *" # 02:00 daily
timezone: "Asia/Jakarta"
steps:
- kind: snapshot_dashboards
project_id: 42 # defaults to scenario's project_id
skip_hours: 12 # don't re-snapshot dashboards touched within 12h
max_dashboards: 500 # safety cap
change_note: "Daily auto-snapshot"

Step config:

FieldDefaultMeaning
project_idscenario's projectProject whose dashboards get snapshotted.
skip_hours12If a dashboard was snapshotted within this many hours, skip it (idempotency on re-runs).
max_dashboards500Safety cap to avoid runaway scans.
change_note"Daily auto-snapshot"Stored in dashboard_revisions.change_note.
author_user_id0System author. Override to attribute the snapshot to a specific service user.

With this step active and a daily cron, every dashboard in the project has a recovery point within 24h regardless of edit activity. The Dashboards → History drawer shows scheduled snapshots alongside editor autosaves; the change_note distinguishes them.

send_email_report (v0.0.38)

The reports surface used to live in its own table (report_schedules) with its own CRON parser, its own delivery worker, and its own audit log. v0.0.38 folded it into the scheduler:

  • Definition. A send_email_report step in a scenario takes a dashboard_ids array, an optional time_filter override, a recipient list, and a delivery format (pdf or html).
  • Rendering. The platform renders each dashboard via the same engine that powers the public-share path; the result is a PDF or inline HTML email body.
  • Delivery. SMTP via the org's configured mail connector (see Connectors → Email). Failed delivery retries up to 3× with exponential backoff before logging an error.
  • Filters. The time_filter override is applied before rendering — useful for "every Monday email last week's numbers" patterns. Without it, the dashboard's own default filter applies.

Existing report_schedules rows from pre-v0.0.38 installs are migrated automatically on first boot of the new binary. The migration creates one scenario per old row, each with a single send_email_report step.

Scenario templates

A template defines a scenario shape that can be instantiated per project. Use templates when:

  • The same recipe build runs across many projects (e.g. nightly mart refresh).
  • A reporting cadence is org-wide policy (e.g. weekly leadership digest).
  • A new project should automatically inherit a baseline schedule.

Template parameters are referenced as {{ param.name }} in step configs. Instantiating the template prompts for parameter values.

# Example template
slug: nightly-mart-refresh
display_name: Nightly mart refresh
parameters:
- name: mart_recipe
type: recipe_ref
trigger:
cron: "0 2 * * *" # 02:00 daily
steps:
- kind: run_recipe
recipe: "{{ param.mart_recipe }}"
- kind: send_email_report
dashboard_ids: [] # populated per project
recipients:
- "{{ project.owner_email }}"

Per-project enable matrix

Admin → Scheduler templates lists every template alongside a project-by-project on/off matrix. Toggling a cell enables or disables the template's scenario for that project. Two patterns:

  • Default-on for new projects. Mark a template default_enabled: true; new projects inherit it on creation.
  • Manual rollout. Leave default_enabled: false and flip the matrix per project as you roll out.

Disabling a template for a project pauses but does not delete the project's instantiated scenario; re-enabling resumes it without losing run history.

Schedule of record

The previous pipeline_schedules table is gone (v0.0.38). Any code or external integration referencing it will fail at startup with a schema-drift error — the platform's startup check (fix(scheduler): schema drift check at startup) explicitly looks for the dropped tables and aborts boot if they exist, since their presence indicates an incomplete upgrade.

If you have downstream tooling reading the old tables, point it at the new APIs:

OldNew
SELECT … FROM pipeline_schedulesGET /api/scheduler/scenarios
SELECT … FROM report_schedulesGET /api/scheduler/scenarios?step_kind=send_email_report
INSERT INTO pipeline_schedules …POST /api/scheduler/scenarios

Heartbeat and observability

The scheduler runs as a separate worker process (hub-platform-scheduler.service) that heartbeats to /api/pipeline/scheduler/health every 30 seconds. The heartbeat updates a scheduler_workers row with last_seen_at, worker_id, and current_scenario_id.

Health probes:

  • /api/pipeline/scheduler/health returns {"alive": true, "last_seen": "..."} if the worker has heartbeated within 90s, else 503.
  • The Operations dashboard's Scheduler tile colors green / amber / red on heartbeat freshness.

Common failure modes:

  • Heartbeat stale, worker process running — Worker is alive but couldn't write to scheduler_workers. Usually a DB lock; check pg_stat_activity for long-running transactions.
  • trigger_config 500 — A scenario's trigger config failed validation at runtime (cron string parse error, undefined parameter). Fixed in v0.0.38; if you see it on v0.0.38+, the offending config is in the response body.
  • Silent FE failure — Pre-v0.0.38, the frontend swallowed scheduler errors in a global try/catch. Removed in v0.0.38; the UI now surfaces errors in the scenario detail panel.

API reference

EndpointDescription
GET /api/scheduler/scenariosList scenarios for the current project.
POST /api/scheduler/scenariosCreate a scenario.
GET /api/scheduler/scenarios/{id}Scenario config + last 100 runs.
PATCH /api/scheduler/scenarios/{id}Update trigger, steps, or recipients.
DELETE /api/scheduler/scenarios/{id}Remove. Run history is preserved.
POST /api/scheduler/scenarios/{id}/runTrigger an out-of-band run, ignoring the scenario's trigger config.
GET /api/scheduler/scenarios/{id}/runsPaginated run history.
GET /api/scheduler/templatesList org-wide templates.
POST /api/scheduler/templatesCreate a template (admin).
POST /api/scheduler/templates/{slug}/enable/{project_id}Enable for a project.
POST /api/scheduler/templates/{slug}/disable/{project_id}Disable for a project.
GET /api/pipeline/scheduler/healthWorker heartbeat.

Gotchas

  • Schema drift hard-fails startup. If the upgrade left pipeline_schedules or report_schedules behind, the platform refuses to boot. Drop them with the migration script (paas/scripts/migrations/drop_pipeline_schedules.sql) before retrying.
  • Cron timezone is UTC by default. Override per scenario via trigger.timezone: "Asia/Jakarta" or set the org default in Project Settings → Defaults.
  • send_email_report does not honor dashboard public-share permissions. It renders as the scenario's owner — if the owner can't see a card, the email won't include it. Use a service-user owner for reports that span dashboards a single human user couldn't access.
  • Parallel steps run inside the same DB transaction by default. Long-running parallel steps can pile up locks; mark them transactional: false to commit per step.