Skip to main content
Version: v0.0.29

Flow

The Flow is the canvas where a project's datasets, recipes, and AI agents are arranged into a dependency graph. Most operator workflows start there: pick the dataset you want, see what feeds it, see what depends on it, and edit the recipe that produces it without leaving the canvas.

Conceptually it sits in the same family as Dataiku's Flow or Airflow's DAG view. Operationally it's a React Flow canvas backed by paas/backend/routers/flow_codegen.py (visual → dbt SQL) and paas/backend/routers/flow_ai.py (AI nodes).

Node types

NodeVisualMeaning
DatasetSquareA queryable dataset — source, intermediate, or mart.
Visual recipeRoundBlock-based recipe (prepare, filter, join, group_by, aggregate, window, formula, stack). Compiled to dbt SQL.
SQL recipeRoundFree-form dbt SQL model.
Python recipeRoundStandalone Python — opaque to dbt.
Notebook recipeRoundJupyter-style notebook recipe.
Sync recipeRoundStandalone ingestion node — pulls from a connector, writes to a dataset.
AI recipeRoundagent, embed, knowledge-base feed.
ZoneGroup containerColor-coded grouping of nodes. Visual only — does not constrain execution.

Edges are drawn input → recipe → output. A recipe with multiple inputs (e.g. a join) has one edge per input.

Building a Flow

There are three ways nodes appear on the canvas:

  1. Add a connector and create a dataset. Each new dataset shows up as a leaf node with no upstream.
  2. Add a recipe via the + New Recipe button on a dataset, or by dragging a block from the Add Item palette. The recipe and its output dataset both appear.
  3. Drag-and-drop a standalone node — useful for Python, Notebook, Sync, and AI recipes that aren't tied to a single upstream.

Drag a node onto a Zone to assign it. Zones don't constrain execution; they're purely organizational.

The visual recipe builder

Visual recipes are the heart of the Flow. Click + New Recipe → Visual on any dataset and the right-side panel opens with the block palette. Each block translates to a piece of dbt SQL, which the platform compiles and writes to models/<layer>/<recipe_name>.sql on save.

Block types and what they produce:

BlockGenerated SQL
prepareSELECT col1, col2, CAST(col3 AS NUMERIC) AS col3, … — column rename, cast, drop, derive.
filter… WHERE col > value AND col2 IS NOT NULL — chained predicates with AND/OR conjunctions.
joinLEFT/INNER/RIGHT/FULL/CROSS JOIN other ON … — multi-key joins, prefix-based collision handling.
group_byGROUP BY col1, col2 paired with the next aggregate block.
aggregateSUM/COUNT/AVG/MIN/MAX/STDDEV/VARIANCE(col) AS alias.
windowROW_NUMBER/RANK/DENSE_RANK/SUM/AVG/COUNT/MIN/MAX OVER (PARTITION BY … ORDER BY …).
formulaArbitrary scalar SQL expression as a derived column.
stackUNION ALL / UNION of two model refs.
sqlPass-through — for steps the visual builder can't express.
syncRead from a connector → write to a target dataset. Not part of the dbt build; runs as a Python step.

Refs between visual blocks resolve to dbt's {{ ref('upstream_model') }}; refs to external sources resolve to {{ source('schema', 'table') }}. The canvas always has the live, resolved SQL one tab away (Preview SQL), so you can copy-paste into psql to debug.

POST /api/flow/preview-sql runs the generated SQL against the connector and returns a 100-row sample without persisting the recipe — useful for iterating before you save.

Reverse-engineering hand-written SQL

POST /api/flow/parse-to-visual attempts to reverse a dbt SQL model into block configs. It's best-effort: simple SELECT … FROM ref(...) WHERE … JOIN … patterns parse cleanly; complex CTEs, window functions inside qualify, and pivot patterns fall back to a single sql block.

Use it to bring legacy dbt projects under visual editing. Don't expect a round-trip to be byte-identical — saving the parsed blocks regenerates SQL in the platform's canonical style.

Running the Flow

Three execution surfaces:

  • Single-node run — Right-click a recipe → Run. Runs only that recipe.
  • Subgraph run — Right-click a dataset → Build → choose upstream, downstream, or full lineage. Runs all the recipes needed to bring that dataset up-to-date.
  • Scheduled run — On the Schedules page, attach a cron expression to a build target. The scheduler picks it up.

Run progress streams to the canvas as colored borders on the running nodes (in-progress / success / failure). Click any node mid-run to see the live log.

Zones

Zones are visual groupings — color-coded rectangles that bundle related nodes. They have no effect on execution, lineage, or permissions. The platform ships with ten zone colors (orange, blue, purple, green, brown, red, amber, pink, indigo, gray) and lets you create custom zone templates.

Common patterns:

  • Bronze / Silver / Gold zones — three zones aligned with the Lakehouse data quality tiers. Raw ingest into Bronze, cleaned into Silver, modeled into Gold.
  • One vertical per Zone — for project-of-projects layouts, a Zone per business domain.
  • Sandbox zone — experimental nodes you don't want in the main lineage. Move them out when promoted.

Two assignment paths: drag-and-drop a node onto a zone, or use Auto-assign (POST /api/flow/zones/auto-assign) which groups nodes by layer and proximity.

The canvas layout (node positions) persists per-project via POST /api/flow/zone-layout. Without an explicit layout, nodes are auto-positioned at first render.

AI recipes

AI nodes appear on the same canvas as data recipes:

  • Agent recipe — references a configured agent (Agent Builder). Run it to invoke the agent against a dataset row stream.
  • Embed recipe — produces a knowledge-base index from a dataset's text column. Persists to the configured vector store (Connectors → Vector stores).
  • Knowledge base — appears as a node so you can see which agents and dashboards consume it.

AI nodes share the run / schedule / log surface with data recipes; mixing them in the same subgraph build is supported.

Lineage and impact analysis

The Flow shows direct edges. For deeper analysis:

  • Lineage Explorer (Operations → Lineage) shows the full upstream/downstream subgraph for any node, with column-level granularity.
  • Asset References panel — sidebar on every dataset / dashboard / recipe edit page. Lists downstream consumers of the asset — answers "what breaks if I change this?" without leaving the editor.

Visual lineage from a recipe's block config (without compiling and running it) is available via POST /api/flow/visual-lineage.

API reference

EndpointDescription
GET /api/projects/{id}/flowCurrent node + edge state for a project.
POST /api/flow/generate-sqlCompile visual block config → dbt SQL.
POST /api/flow/parse-to-visualBest-effort SQL → block config.
POST /api/flow/preview-sqlRun generated SQL, return sample rows. Does not save.
POST /api/flow/visual-lineageColumn-level lineage from block config.
GET /api/flow/recipe/{model_name}Fetch the visual recipe for a dbt model.
POST /api/flow/save-recipe-stepsPersist visual block config as a dbt model.
POST /api/pipeline/runTrigger a build target.
GET /api/pipeline/runsList past runs with status, duration, log refs.
GET /api/lineage/{dataset}Full lineage subgraph with column-level edges.
GET /api/flow/zonesList zone templates.
POST /api/flow/zones/auto-assignAuto-assign nodes to zones.
POST /api/flow/zone-layoutPersist canvas positions.
GET /api/flow/recipe-templatesList reusable recipe templates.
POST /api/flow/python-recipe/runExecute a Python recipe.
POST /api/flow/ai-recipes/{id}/runExecute an AI recipe.

Performance characteristics

The canvas renders client-side. Node positions are stored on each row in the DB (not auto-laid-out on each load), which makes initial render O(n). For projects with < 200 nodes the canvas is interactive at 60 fps. Above ~500 nodes layout becomes the bottleneck — collapse zones, or use the Search sidebar to jump directly to a node by name.

Gotchas

  • Python and Notebook recipes are opaque to dbt — the Flow shows them as nodes and tracks their input/output edges, but they don't participate in dbt run's topological build. Schedule them separately.
  • Sync recipes are not dbt models — they are Python steps that move bytes from a connector to a dataset. Schedule them on the Schedules page; subgraph builds do not include them.
  • parse-to-visual is best-effort. Complex SQL falls back to a single sql block. Don't expect round-trip byte-identity.
  • AI recipes require backing rows. Agents must be published from Agent Builder, knowledge bases must be created on the Knowledge tab — adding the node on the Flow canvas references them, it does not create them.