Lewati ke konten utama
Versi: Saat ini

Machine Learning

The ML Lab brings predictive modeling onto the Flow canvas. Four model recipes — Train, Score, Cluster, and Evaluate — sit in the GenAI palette group alongside the AI recipes, so a model fits into a project the same way every other transform does: an input table feeds a recipe block, and the block produces either a saved model or a scored output table.

Models are first-class, governed assets. A Train recipe registers a versioned saved model that is org-scoped, addressable by name, and inspectable from the Models page — no notebook hand-off, no artifacts to move around. Everything you can do from the canvas is also scriptable; see From the API and SDK.

The sklearn / joblib stack runs in an isolated subprocess (the same source-mode pattern as parse_documents), so the modeling libraries never enter the platform's import graph. Operationally that's invisible — it just means model runs execute out-of-process with a 60-minute ceiling.

The model recipes

Drop a block from the GenAI group of the Add Item palette; Honeyframe opens the ML recipe sidebar, one surface that adapts to the four actions:

RecipeInputProduces
🎓 TrainA tableA new version of a saved model.
🎯 ScoreA table + a saved modelAn output table with predictions appended.
🧩 ClusterA tableA trained clustering model, optionally a labeled output table.
📊 EvaluateA table + a saved modelMetrics computed against the supplied rows.

Like every recipe, an ML block doesn't run on save — click Save & Run to execute it. Each block reports its result inline: a model id, a headline metric, and (for Score/Cluster) the output table name. Runs are recorded in job_runs like any other recipe, and errors surface back into the run record.

Training a model

A Train recipe takes an input table and a configuration:

  • Task — the kind of model (classification or regression).
  • Algorithm — the learner to fit (e.g. random_forest).
  • Target — the column to predict.
  • Features — the columns to learn from. Feature types are split automatically; very high-cardinality columns are dropped.
  • Test size — the holdout fraction used to compute metrics (e.g. 0.2).

On run, the recipe fits a preprocessing-plus-model pipeline (imputation and one-hot encoding travel with the model, so scoring later needs no separate prep), evaluates it on the holdout split, and registers a new version of the saved model under its name. The registered row carries the task, algorithm, target, feature list, training-row count, computed metrics, and feature importance (aggregated back from the one-hot columns to the source features). Training is capped at 250k rows.

Re-running Train with the same model name adds another version rather than overwriting — version history is preserved and you choose which one is active.

Scoring and clustering

Score applies an existing saved model to a table. Point the block at the input table, pick the saved model from the model picker (populated from /api/models), and name the output table. The recipe loads the model's bundled pipeline and writes predictions into the output dataset, which appears on the canvas as a normal downstream table.

Cluster fits an unsupervised model on the input table. Set k (the number of clusters) and, optionally, an output table to receive each row's cluster label. Clustering produces a trained model the same way Train does, so it shows up on the Models page.

Evaluating

An Evaluate recipe runs a saved model against a supplied table — a holdout set, a fresh extract, or any table with the model's columns — and computes metrics on it, without writing a scored output. Use it to check a model against new data or to compare versions before promoting one.

The Models page

The Models page lives under the Intelligence section of the nav (/models). It lists a project's saved models with versions grouped by name, each carrying an active badge, a task badge, and the model's headline metric. project_id defaults to your active project, so the page shows the project you're in.

Click a model to open the report drawer:

  • Metrics grid — the model's computed metrics.
  • Feature-importance bars — the top drivers, ranked (top 12).
  • Provenance — the version's training origin (the on-disk artifact path is never exposed).
  • Activate — promote a version to be the active one for that name.
  • Delete — guarded: deleting the active version is refused (409) while sibling versions exist; the artifact directory is cleaned up best-effort.

From the API and SDK

Models are authored through the same machinery as other AI recipes. A Train/Score/Cluster/Evaluate block is an entry in /api/flow/ai-recipes (create → set run_config → run), and the resulting saved models are read and managed through /api/models.

The Python SDK (honeyframeapi) wraps this with high-level helpers — train_model, score_model, cluster_model, evaluate_model — that build the run config and create-then-run the recipe for you, plus list_models / get_model returning SavedModel handles:

model = project.train_model(
table="customers",
target="churned",
features=["tenure", "monthly_charges", "plan"],
algorithm="random_forest",
test_size=0.2,
)
project.score_model(model_id=model.id, table="new_customers", output_table="churn_scored")

model = project.get_model("churn")
model.metrics()
model.feature_importance() # ranked drivers
model.versions() # version history
model.activate() # promote a version

For the full SDK surface — the MlRecipe handle and every SavedModel method — see Developer → SDK → Machine learning.

API reference

EndpointDescription
POST /api/flow/ai-recipesCreate an ML recipe (recipe_type = train/score/cluster/evaluate).
PUT /api/flow/ai-recipes/{id}Set the recipe's run_config.
POST /api/flow/ai-recipes/{id}/runRun the recipe (train / score / cluster / evaluate).
GET /api/modelsList a project's saved models; ?name= returns one model's version history.
GET /api/models/{model_id}Report detail — metrics, feature importance, sibling versions.
POST /api/models/{model_id}/activatePromote a version to active within its name.
DELETE /api/models/{model_id}Guarded delete (refuses an active version with siblings).