Skip to main content
Version: Next

Recipes

A recipe transforms one or more input datasets into an output dataset. Honeyframe's recipe model splits transforms into two layers:

  • Visual recipes — a guided UI that captures the operator's intent (filter rows, join two tables, group and aggregate). The visual recipe compiles to dbt SQL behind the scenes.
  • Code recipes — direct SQL or Python that the user writes. The platform doesn't try to interpret the code; it just runs it.

Both produce dbt models, so the lineage graph and dbt's incremental-build machinery work identically across recipe types.

On native-engine orgs (see Flow → Build engines) the same recipes run without dbt: visual and SQL recipes are stored as target='sql' with Jinja-free inline SQL and materialized directly. Recipe authoring is identical across both engines.

Recipe types

The platform recognizes the following types. The first six are visual; the rest are code/standalone.

TypeLayerWhat it does
prepareVisualColumn-level operations: rename, drop, cast, derive (formula), filter, window, plus a wide processor catalog (see Prepare processors). The most common recipe — typically the first step on a raw dataset.
joinVisualJoin two datasets on one or more key pairs. Supports inner, left, right, outer.
fuzzy_joinVisualJoin on approximate equality — trigram (pg_trgm), levenshtein (fuzzystrmatch), or exact_normalized. Refuses a keyless predicate and warns about row-explosion.
group_byVisualGroup by columns and aggregate (count, sum, avg, min, max, median).
pivotVisualLong → wide via conditional aggregation. Spread values typed manually or auto-discovered via Fetch from data.
top_nVisualPer-group top-K with ROW_NUMBER().
windowVisualRunning totals / rank / lag-lead via a single window function.
sampleVisualRow subset for fast development — first (LIMIT) or random (ORDER BY RANDOM()).
distinctVisualDeduplicate rows — whole-row or DISTINCT ON (cols) with an optional deterministic tiebreaker.
generate_statisticsVisualColumn profiling — one row per column (fill rate, null/distinct counts, min/max).
stackVisualUNION two or more datasets with the same shape.
sqlCodeFree-form SQL. The user writes a model body; the platform wraps it with the appropriate config.
sql_scriptCodeMulti-statement SQL script run verbatim — see SQL Script recipe.
exportCodeTerminal sink — writes a dataset to a file (csv/parquet/xlsx) in a managed folder.
pythonCodePython script with access to a dataiku-style API for reading inputs and writing outputs. Standalone — does not become a dbt model.
cdcStandaloneChange-data-capture from a source connector. Watermarks tracked in pipeline_runs.
syncStandaloneOne-way sync from one connector to another. Used for ingestion to the Lakehouse.
extract_documentStandaloneOCR / extraction from PDF or image inputs into structured rows.
parse_documentsAgentTranscribe a folder of PDFs/scans into a {file, extracted_text, parsed_json} table, with an optional LLM key-value parse.
embedAgentEmbed text from a dataset into a vector store.
llm_enrichAgentRun an LLM prompt over each row of a dataset and write the response back.
classify_textAgentMap a text column to a category from a fixed label set.
summarize_textAgentSummarize a text column to a word budget.
agentAgentMulti-step agent run with tool calls.
rag_searchAgentRetrieve from a knowledge base and answer a query.

The agent recipe types (embed, llm_enrich, classify_text, summarize_text, agent, rag_search) form the AI surface. They share the same plumbing as visual/code recipes — outputs become datasets — but their inputs include a knowledge base or model selector instead of just upstream datasets.

Anatomy of a visual recipe

Recipes are authored on the Flow canvas. Drag a block from the palette and Honeyframe opens a sidebar for the new block. SQL mode shows a textarea where you write a self-contained query or reference upstream datasets via {{ ref('upstream_model') }}:

Recipe builder with SQL block

Click Create Model. The block compiles and runs immediately; on success the green pill confirms the resulting model file:

Recipe created

The Prepare recipe is the canonical visual example. It exposes four tabs:

  • Columns — select columns to keep, rename, cast types, drop.
  • Filter — boolean expressions on the result rows. Compiled to a WHERE clause.
  • Formula — derive new columns from existing ones. Honeyframe supports a small expression language (numeric ops, coalesce, case, date functions).
  • Window — windowed aggregates (ROW_NUMBER, LAG, LEAD, RANK) partitioned by columns.

When the user saves a Prepare recipe, the platform compiles each tab's intent to a SQL model. The compiled SQL is visible in the recipe editor's SQL tab — useful for debugging or for promoting a visual recipe to a code recipe.

Prepare processors

Beyond the four core tabs, a Prepare recipe is a linear pipeline of discrete processors. Each one emits a derived column (the source is kept) and is compiled to portable SQL that runs on both Postgres and DuckDB. The catalog covers routine cleanup so operators stop dropping to a SQL recipe:

ProcessorWhat it does
split_columnSplit a column into N parts via SPLIT_PART.
fill_nullReplace nulls with a constant (COALESCE).
cast_typeCast a column to a target type, with a cast-on-error flag.
trim_whitespaceTRIM / LTRIM / RTRIM.
normalize_whitespaceCollapse runs of whitespace to a single space and trim ends ("JOHN DOE ""JOHN DOE").
padLPAD / RPAD to a fixed width — e.g. left-pad a code to restore lost leading zeros.
replace_textLiteral or regex find/replace.
concat_columnsConcatenate columns (NULL-safe).
coalesce_columnsFirst non-null across N columns, in user-picked order — fills a value from a fallback column (vs fill_null, which fills from a constant).
extract_patternPull a character class (digits / letters / alphanumeric / custom) out of noisy text, e.g. "+62 812-3456 (HP)""628123456".
map_valuesStandardize a categorical column via a from→to value map (CASE), evaluated in order, with a configurable ELSE default.
hash_columnDeterministic MD5(col::text) — a stable pseudonym / blocking key. Note: a pseudonym, not secure anonymization (MD5 of a known value is reversible by lookup).
flag_validEmit a boolean data-quality flag: not_empty, length_range, or all_digits.
date_extractExtract a date part (year/month/day/…).
round_numberRound a numeric column.
bucket_numericBucket a numeric column into ranges (CASE-WHEN chain, first match wins).
window_row_numberROW_NUMBER() OVER (…).
clean_column_namesSnake-case every source column's alias (an explicit rename still wins).

Code recipes

A SQL recipe lets the user write a dbt model body directly:

SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS revenue,
MAX(created_at) AS last_order_at
FROM {{ ref('stg_orders') }}
GROUP BY customer_id

The {{ ref('...') }} macro resolves to the upstream dataset's compiled name. The recipe editor's input panel shows which datasets are referenced, so the dependency graph stays accurate.

A Python recipe runs as a standalone script under the platform's tenant workspace. The platform exposes a small recipe_runner API for reading inputs and writing the output dataset:

from honeyframe.recipe import inputs, outputs

df = inputs.read("upstream_dataset") # pandas DataFrame
df["enriched"] = df["raw"].apply(my_func)
outputs.write("downstream_dataset", df)

Python recipes cannot be combined with dbt's lineage graph automatically — they're opaque to dbt — so they appear as standalone nodes in the Flow.

SQL Script recipe

A SQL Script recipe runs a multi-statement script verbatim against the warehouse — the author owns the DDL/DML into the output dataset. Statements are split (via sqlparse) and executed in order within a single transaction, so any failure rolls the whole script back. Project variables referenced as :name / ${name} are bound per statement (injection-safe), never string-interpolated. It is authored in a Monaco multi-statement editor with project-variable hints.

Use it for steps the visual builder can't express that still need to be more than one statement — temp tables, multi-step transforms, explicit CREATE/INSERT sequences.

Export recipe

An Export recipe is a terminal sink: it writes its input dataset to a file (csv, parquet, or xlsx) in a managed folder. It produces no downstream dataset — it's the end of a flow branch. Useful for handing data off to systems outside the warehouse.

Running a recipe

Recipes don't run on save. They run when:

  • The user clicks Run in the recipe editor, OR
  • A scheduled job triggers them as part of a Flow run, OR
  • A downstream dataset is queried and Honeyframe detects a stale upstream and runs the recipe to refresh.

Run progress appears live in the recipe editor (pipeline_runs table polled every second). Run history is preserved indefinitely; click any past run to see the compiled SQL, parameters, and stdout.

Lineage

Every recipe registers an edge in the platform's column-level lineage graph. The graph is queryable via the Lineage Explorer (Flow → Lineage) and via /api/lineage. A column-level edge tracks which upstream column each downstream column derives from — invaluable for impact analysis when changing a source.

Migrating from require_role for recipes

Recipe-level access control follows the dataset model: anyone who can read the input datasets can read the recipe; modifying the recipe requires dataset.readwrite (or, in the legacy layer, the editor role) on the recipe's output dataset.

When the dataset permission migration completes (see Permissions Reference), recipes will gain object-level permissions matching their output dataset.