Lewati ke konten utama
Versi: Saat ini

v0.0.82 — Cobuild lineage tool + Dataiku-parity slices, Catalog cross-tool stitch

Released: 2026-05-20. 4 commits.

A focused Cobuild release that wires the v0.0.81 catalog walk into the planner as a first-class agent tool, lights up five Dataiku-parity slices on the Cobuild surface, and closes a long-standing cross-tool gap between the Dataiku adapter and S3Adapter.

Cobuild — get_lineage agent tool

The catalog walk endpoint shipped in v0.0.81 (/api/catalog/lineage/walk) is now wrapped as a read-only agent tool so Cobuild and the Dashboard chat agent can answer questions like "how was this dataset built?" or "where does column Y come from?" by tracing across every configured catalog adapter (dbt, Dataiku, SSIS, Informatica, Kafka).

The tool lives in dashboard_chat_agent.py with a direction filter, optional column scoping, a 3 KB result trim, and a graceful no-edges hint when no adapters are configured. cobuild_planner.py re-exports the schema via _REUSED_TOOL_NAMES and adds a planner system-prompt bullet so the planner picks it up without duplication. catalog_discovery.discover_datasets now stamps a catalog_fqn on every result, so the agent can chain discover_datasets straight into get_lineage cleanly. Only dbt-shaped FQNs are synthesised today — other adapter FQN shapes are deferred. Seven unit tests cover both surfaces.

Cobuild — planner extension and DATABASE_URL fallback

A follow-on increment to the get_lineage work extends the planner schema and workstream spawn flow (cobuild_planner.py grows by 113 lines, with the new contract pinned by test_cobuild_planner additions). extract_codegen.get_db_conn() now falls back to parsing DATABASE_URL / HUB_PLATFORM_DATABASE_URL when the legacy DWH_* / DB_* env vars are not set, so Cobuild-generated extract scripts run on any ECS without per-host config. recipe_runner also selects script_path plus inline_code on its lookup so downstream code has direct access to a recipe's executable payload, and a new tests/e2e/demo-cobuild-recipe.mjs Playwright script demos the Cobuild-to-recipe end-to-end flow.

Cobuild — Dataiku-parity slices 1–5, ghost nodes, auto-run

Five slices on the Cobuild surface together close the parity gap with Dataiku Answers patterns.

Slice 1 — inline data cards. Planner traces of found_datasets, dq_summary, build_step, and navigate are synthesised into inline cards in the chat transcript. A new cobuild_messages.inline_cards JSONB column plus migration backs the persistence.

Slice 2 — navigate_to tool. The planner can now swap the host pane mid-loop via a navigate_to tool with a route whitelist (project routes including flow, datasets, dashboards) so a single plan can take the user from a dataset card to the matching dashboard without losing context.

Slice 4 — streaming narration. agent_runtime fires on_progress after every tool, the router UPDATEs a plan_state placeholder, and a frontend hook polls every 900 ms while a run is busy. The transcript renders terminal-style check-mark tool labels plus a circle "Working" pulse so the user sees progress in real time.

Slice 5 — Flow Builder as host pane. The planner system prompt plus project-slug context routes to the project's flow view after create_recipe, so a freshly-built recipe lands on the canvas where the user expects to keep editing it.

Plan A — lineage ghost nodes. flow_recipes.output_datasets entries with no materialised table now emit a synthetic dashed-amber ghost node (FQN shape ghost.recipe.<id>.<name>) so a just-created recipe is visible on the canvas before its first successful run.

Plan B — quickstart auto-run. The quickstart seeds the first thread inline (kind=pipeline) so the Send button works on a fresh session without waiting for the auto-run race, and Send is disabled while the active thread id is null. Coverage: 10 inline-card unit tests, 10 navigate_to validation tests, planner catalog pin, router quickstart, and a planner-fake on_progress kwargs test.

Catalog — Dataiku to S3 cross-tool stitch

When a DSS dataset's detail JSON reports type=S3, the Dataiku adapter now synthesises a REPLICATE edge from the canonical s3.<bucket>.<schema>.<dataset> FQN into the dataiku.* FQN. This is the same FQN convention S3Adapter publishes, so the lineage walk endpoint threads through without either side knowing about the other — a mirror of the OGG vendor-prefix pattern from v0.0.81. Hive partition segments in the DSS path (for example a dt=2026-05-01 segment) are stripped so the upstream FQN collapses to the same prefix that S3Adapter would emit for the same bucket layout. The stitch falls back silently when the dataset detail returns 404, when the type is not S3, or when params lack a bucket or path, so table-level recipe edges keep working in every failure mode.