Machine Learning
The ML Lab brings predictive modeling onto the Flow canvas. Four model recipes — Train, Score, Cluster, and Evaluate — sit in the GenAI palette group alongside the AI recipes, so a model fits into a project the same way every other transform does: an input table feeds a recipe block, and the block produces either a saved model or a scored output table.
Models are first-class, governed assets. A Train recipe registers a versioned saved model that is org-scoped, addressable by name, and inspectable from the Models page — no notebook hand-off, no artifacts to move around. Everything you can do from the canvas is also scriptable; see From the API and SDK.
The sklearn / joblib stack runs in an isolated subprocess (the same source-mode pattern as parse_documents), so the modeling libraries never enter the platform's import graph. Operationally that's invisible — it just means model runs execute out-of-process with a 60-minute ceiling.
The model recipes
Drop a block from the GenAI group of the Add Item palette; Honeyframe opens the ML recipe sidebar, one surface that adapts to the four actions:
| Recipe | Input | Produces |
|---|---|---|
| 🎓 Train | A table | A new version of a saved model. |
| 🎯 Score | A table + a saved model | An output table with predictions appended. |
| 🧩 Cluster | A table | A trained clustering model, optionally a labeled output table. |
| 📊 Evaluate | A table + a saved model | Metrics computed against the supplied rows. |
Like every recipe, an ML block doesn't run on save — click Save & Run to execute it. Each block reports its result inline: a model id, a headline metric, and (for Score/Cluster) the output table name. Runs are recorded in job_runs like any other recipe, and errors surface back into the run record.
Training a model
A Train recipe takes an input table and a configuration:
- Task — the kind of model (classification or regression).
- Algorithm — the learner to fit (e.g.
random_forest). - Target — the column to predict.
- Features — the columns to learn from. Feature types are split automatically; very high-cardinality columns are dropped.
- Test size — the holdout fraction used to compute metrics (e.g.
0.2).
On run, the recipe fits a preprocessing-plus-model pipeline (imputation and one-hot encoding travel with the model, so scoring later needs no separate prep), evaluates it on the holdout split, and registers a new version of the saved model under its name. The registered row carries the task, algorithm, target, feature list, training-row count, computed metrics, and feature importance (aggregated back from the one-hot columns to the source features). Training is capped at 250k rows.
Re-running Train with the same model name adds another version rather than overwriting — version history is preserved and you choose which one is active.
Scoring and clustering
Score applies an existing saved model to a table. Point the block at the input table, pick the saved model from the model picker (populated from /api/models), and name the output table. The recipe loads the model's bundled pipeline and writes predictions into the output dataset, which appears on the canvas as a normal downstream table.
Cluster fits an unsupervised model on the input table. Set k (the number of clusters) and, optionally, an output table to receive each row's cluster label. Clustering produces a trained model the same way Train does, so it shows up on the Models page.
Evaluating
An Evaluate recipe runs a saved model against a supplied table — a holdout set, a fresh extract, or any table with the model's columns — and computes metrics on it, without writing a scored output. Use it to check a model against new data or to compare versions before promoting one.
The Models page
The Models page lives under the Intelligence section of the nav (/models). It lists a project's saved models with versions grouped by name, each carrying an active badge, a task badge, and the model's headline metric. project_id defaults to your active project, so the page shows the project you're in.
Click a model to open the report drawer:
- Metrics grid — the model's computed metrics.
- Feature-importance bars — the top drivers, ranked (top 12).
- Provenance — the version's training origin (the on-disk artifact path is never exposed).
- Activate — promote a version to be the active one for that name.
- Delete — guarded: deleting the active version is refused (
409) while sibling versions exist; the artifact directory is cleaned up best-effort.
From the API and SDK
Models are authored through the same machinery as other AI recipes. A Train/Score/Cluster/Evaluate block is an entry in /api/flow/ai-recipes (create → set run_config → run), and the resulting saved models are read and managed through /api/models.
The Python SDK (honeyframeapi) wraps this with high-level helpers — train_model, score_model, cluster_model, evaluate_model — that build the run config and create-then-run the recipe for you, plus list_models / get_model returning SavedModel handles:
model = project.train_model(
table="customers",
target="churned",
features=["tenure", "monthly_charges", "plan"],
algorithm="random_forest",
test_size=0.2,
)
project.score_model(model_id=model.id, table="new_customers", output_table="churn_scored")
model = project.get_model("churn")
model.metrics()
model.feature_importance() # ranked drivers
model.versions() # version history
model.activate() # promote a version
For the full SDK surface — the MlRecipe handle and every SavedModel method — see Developer → SDK → Machine learning.
API reference
| Endpoint | Description |
|---|---|
POST /api/flow/ai-recipes | Create an ML recipe (recipe_type = train/score/cluster/evaluate). |
PUT /api/flow/ai-recipes/{id} | Set the recipe's run_config. |
POST /api/flow/ai-recipes/{id}/run | Run the recipe (train / score / cluster / evaluate). |
GET /api/models | List a project's saved models; ?name= returns one model's version history. |
GET /api/models/{model_id} | Report detail — metrics, feature importance, sibling versions. |
POST /api/models/{model_id}/activate | Promote a version to active within its name. |
DELETE /api/models/{model_id} | Guarded delete (refuses an active version with siblings). |