Versi: Saat ini

Machine Learning

The ML Lab brings predictive modeling onto the Flow canvas. Four model recipes — Train, Score, Cluster, and Evaluate — sit in the GenAI palette group alongside the AI recipes, so a model fits into a project the same way every other transform does: an input table feeds a recipe block, and the block produces either a saved model or a scored output table.

Models are first-class, governed assets. A Train recipe registers a versioned saved model that is org-scoped, addressable by name, and inspectable from the Models page — no notebook hand-off, no artifacts to move around. Everything you can do from the canvas is also scriptable; see From the API and SDK.

The sklearn / joblib stack runs in an isolated subprocess (the same source-mode pattern as parse_documents), so the modeling libraries never enter the platform's import graph. Operationally that's invisible — it just means model runs execute out-of-process with a 60-minute ceiling.

The model recipes

Drop a block from the GenAI group of the Add Item palette; Honeyframe opens the ML recipe sidebar, one surface that adapts to the four actions:

Recipe	Input	Produces
🎓 Train	A table	A new version of a saved model.
🎯 Score	A table + a saved model	An output table with predictions appended.
🧩 Cluster	A table	A trained clustering model, optionally a labeled output table.
📊 Evaluate	A table + a saved model	Metrics computed against the supplied rows.

Like every recipe, an ML block doesn't run on save — click Save & Run to execute it. Each block reports its result inline: a model id, a headline metric, and (for Score/Cluster) the output table name. Runs are recorded in job_runs like any other recipe, and errors surface back into the run record.

Training a model

A Train recipe takes an input table and a configuration:

Task — the kind of model (classification or regression).
Algorithm — the learner to fit (e.g. random_forest).
Target — the column to predict.
Features — the columns to learn from. Feature types are split automatically; very high-cardinality columns are dropped.
Test size — the holdout fraction used to compute metrics (e.g. 0.2).

On run, the recipe fits a preprocessing-plus-model pipeline (imputation and one-hot encoding travel with the model, so scoring later needs no separate prep), evaluates it on the holdout split, and registers a new version of the saved model under its name. The registered row carries the task, algorithm, target, feature list, training-row count, computed metrics, and feature importance (aggregated back from the one-hot columns to the source features). Training is capped at 250k rows.

Re-running Train with the same model name adds another version rather than overwriting — version history is preserved and you choose which one is active.

Scoring and clustering

Score applies an existing saved model to a table. Point the block at the input table, pick the saved model from the model picker (populated from /api/models), and name the output table. The recipe loads the model's bundled pipeline and writes predictions into the output dataset, which appears on the canvas as a normal downstream table.

Cluster fits an unsupervised model on the input table. Set k (the number of clusters) and, optionally, an output table to receive each row's cluster label. Clustering produces a trained model the same way Train does, so it shows up on the Models page.

Evaluating

An Evaluate recipe runs a saved model against a supplied table — a holdout set, a fresh extract, or any table with the model's columns — and computes metrics on it, without writing a scored output. Use it to check a model against new data or to compare versions before promoting one.

The Models page

The Models page lives under the Intelligence section of the nav (/models). It lists a project's saved models with versions grouped by name, each carrying an active badge, a task badge, and the model's headline metric. project_id defaults to your active project, so the page shows the project you're in.

Click a model to open the report drawer:

Metrics grid — the model's computed metrics.
Feature-importance bars — the top drivers, ranked (top 12).
Provenance — the version's training origin (the on-disk artifact path is never exposed).
Activate — promote a version to be the active one for that name.
Delete — guarded: deleting the active version is refused (409) while sibling versions exist; the artifact directory is cleaned up best-effort.

From the API and SDK

Models are authored through the same machinery as other AI recipes. A Train/Score/Cluster/Evaluate block is an entry in /api/flow/ai-recipes (create → set run_config → run), and the resulting saved models are read and managed through /api/models.

The Python SDK (honeyframeapi) wraps this with high-level helpers — train_model, score_model, cluster_model, evaluate_model — that build the run config and create-then-run the recipe for you, plus list_models / get_model returning SavedModel handles:

model = project.train_model(
    table="customers",
    target="churned",
    features=["tenure", "monthly_charges", "plan"],
    algorithm="random_forest",
    test_size=0.2,
)
project.score_model(model_id=model.id, table="new_customers", output_table="churn_scored")

model = project.get_model("churn")
model.metrics()
model.feature_importance()   # ranked drivers
model.versions()             # version history
model.activate()             # promote a version

For the full SDK surface — the MlRecipe handle and every SavedModel method — see Developer → SDK → Machine learning.

API reference

Endpoint	Description
`POST /api/flow/ai-recipes`	Create an ML recipe (`recipe_type` = `train`/`score`/`cluster`/`evaluate`).
`PUT /api/flow/ai-recipes/{id}`	Set the recipe's `run_config`.
`POST /api/flow/ai-recipes/{id}/run`	Run the recipe (train / score / cluster / evaluate).
`GET /api/models`	List a project's saved models; `?name=` returns one model's version history.
`GET /api/models/{model_id}`	Report detail — metrics, feature importance, sibling versions.
`POST /api/models/{model_id}/activate`	Promote a version to active within its name.
`DELETE /api/models/{model_id}`	Guarded delete (refuses an active version with siblings).

The model recipes​

Training a model​

Scoring and clustering​

Evaluating​

The Models page​

From the API and SDK​

API reference​