Lewati ke konten utama
Versi: v0.1.8

v0.0.81 — Catalog Phase 4 live + cross-tool lineage, Agent Chat, Webapp builder polish

Released: 2026-05-20. 38 commits.

A heavy catalog release. Phase 4 closes — every file-inventory adapter (Dataiku, Kafka, Informatica, SSIS, Oracle GoldenGate) now has a live extractor — and the cockpit gets cross-tool lineage stitching, column-level edges via sqlglot, and a real SVG graph view. Separately, the chat surface is rebuilt around Dataiku Answers patterns, Agent Reviews gains retry + live progress, and the Webapp builder closes the last JSONB-only gaps on chrome, theme, filters, and card-level overrides.

Catalog — Phase 4 live extractors complete

The four file-inventory adapters introduced in v0.0.80 all grow a live extraction path. Same factory-dispatch pattern across each — the registry name stays stable (dataiku, kafka, informatica, ssis), and live mode activates the moment its mode-specific config keys are set. File mode stays the default so v0.0.80 tenants need no config change.

  • Dataiku DSS — walks /public/api/projects/, enumerates datasets and recipes, and parses recipe inputs[]/outputs[] into lineage edges. HTTP Basic with api_key as username (DSS convention). Per-project failures are caught and logged so a half-broken DSS does not blank the catalog.
  • Kafka Schema Registry — walks /subjects + /subjects/{s}/versions/latest, folds <topic>-value / <topic>-key subjects under a shared topic property, and surfaces Avro doc as description. Optional HTTP Basic for Confluent Cloud.
  • Kafka Connect — when connect_url is configured, the same Kafka adapter walks /connectors and emits each as a connector asset tagged sink or source. get_lineage synthesises connector to topic edges: sink connectors get topic_fqn -> connector_fqn, source connectors get the reverse. Debezium / JDBC prefix-only sources surface a synthetic prefix* topic so the graph never dangles.
  • Informatica IICS / IDMC — the IICS two-step login protocol (POST /ma/api/v2/user/login then GET {serverUrl}/api/v2/mapping) plus mtTask enumeration. Mappings surface as mapping assets, tasks as task carrying mapping_id so the cockpit can chain them. PowerCenter (on-prem, pmrep-based) stays in file mode — mixed shops configure two informatica instances.
  • SSIS — has no REST API, so live mode parses a directory of checked-out .dtsx XML files. Each package becomes one asset; nested Pipeline Executables surface as dataflow tags; OLE-DB OpenRowset values get captured as referenced tables. Project subfolders carve into FQN components so duplicate package names across projects do not collide.
  • Oracle GoldenGate .prm config walker — replaces the stub OggAdapter with a real parameter-file parser. Each .prm surfaces as one extract / replicat / pump asset; MAP / TABLE / trail-file statements become REPLICATE edges, including multi-line MAPs. When an EXTRACT writes a trail any other group declares as input, an extract-to-replicat edge stitches the full CDC chain so the cockpit shows the entire pipeline as one path.

OGG also picks up vendor-prefix FQN stitching: with source_vendor / target_vendor config keys plus a parsed SOURCEDB / TARGETDB, edges emit FQNs like oracle.<db>.<schema>.<tbl> instead of the OGG-local namespace. The missing wedge for cross-tool lineage — an Oracle catalog adapter pointed at the same DB now publishes identical FQNs, so the walk endpoint stitches the hop with zero glue.

Catalog — S3 adapter

New S3Adapter walks a bucket prefix via boto3, groups data files (.parquet / .csv / .json / .jsonl / .avro / .orc / .tsv) by their parent prefix, and collapses Hive-style partition segments (dt=2026-05-01/region=us) so a partitioned dataset becomes one CatalogAsset rather than thousands. FQN shape s3.<bucket>.<schema>.<table> matches every other adapter, so the walk endpoint pivots on the s3 prefix without special casing. get_lineage() returns [] — S3 is a storage layer; upstream tools (Dataiku S3 recipe, dbt-spark, OGG bigdata adapter) own the edges that touch s3.* FQNs.

Catalog — column-level lineage via sqlglot

New routers/catalog/column_lineage.py extracts a column map ({out_col: [(upstream_fqn, upstream_col), ...]}) from a SQL string plus an upstream resolver. Pure SQL parsing, no adapter coupling — any adapter carrying SQL text can call it.

Wired into three adapters this release:

  • dbt — each parent edge now carries the slice of column_map pointing back to that specific parent. Table-level edges still emit when SQL parsing fails or the model has only un-rendered Jinja.
  • DataikuDataikuLiveAdapter.get_lineage fetches each SQL recipe's body and runs it through the extractor. Recipe-type detection: anything whose type contains sql (sql_query, sql_script, sync-with-SQL-engine). Visual prep / Python / R recipes skip — their payloads are not SQL.
  • InformaticaInformaticaLiveAdapter.get_lineage looks up the mapping by name, fetches /api/v2/mapping/{id}/details, cross-joins sources by targets, and when a source carries customSqlQuery or sqlOverride runs it through the same extractor.

Coverage: bare SELECT, aliased / function-wrapped columns, expression columns, multi-table joins with alias resolution, single-level CTEs, and UNION / INTERSECT / EXCEPT per-leg merge. Failure modes are explicit — unparseable SQL returns an empty dict with a warning; unresolvable upstreams emit an out-column with an empty source list so the cockpit can flag ambiguity instead of silently dropping it.

Catalog — unified cross-adapter lineage walk

New endpoint GET /api/catalog/lineage/walk?root_fqn=... stitches edges across every configured adapter instance. Pivots on the FQN's leading tool prefix (dbt., dataiku., kafka., ...) to figure out which adapter resolves each hop, then recurses through cross-tool endpoints until max_depth or convergence. Per-adapter get_lineage failures are logged and skipped so a single half-broken upstream does not blank the walk. The response carries truncated=true when max_depth halted traversal so the cockpit can offer a "go deeper" affordance instead of silently under-reporting.

A tools= filter scopes the walk to a subset (?tools=dbt,ogg for a clean two-system slice). Endpoints reached via a discovered edge still surface as nodes so the cockpit can colour the cross-tool perimeter, but recursion is skipped into filtered-out adapters. The root FQN's own tool is always allowed regardless of the filter.

Catalog — combined-lineage drawer in the Migration Cockpit

Clicking Lineage on any feed row in the Migration Cockpit now opens a side drawer that calls /lineage/walk and renders the combined graph. Two views:

  1. Sources involved — nodes grouped by tool prefix with one coloured badge per adapter family (dbt / dataiku / kafka / ssis / informatica / ogg / excel). Cross-tool stitching is visible at a glance.
  2. Edges — every upstream-to-downstream hop with the recipe or model that produced it; column_map expands inline when sqlglot could parse the source SQL (so the cockpit shows customer_name <- name under a Dataiku-to-dbt edge).

Controls: depth dropdown (1 / 2 / 3 / 5 / 8, default 3) and a "walk truncated" chip when max_depth halted traversal.

A Graph / List toggle adds an SVG graph view: nodes laid out in BFS-depth columns from the root, edges as cubic-bezier curves with arrowheads. Negative columns are upstream hops (left); positive are downstream (right). Node colour by tool prefix; the root carries a black border. Edge style distinguishes column-level (solid amber) from table-level-only (dashed grey) so the user knows which hops sqlglot could parse. Layout is deterministic — same FQNs in, same picture out.

Click-to-highlight isolates a node's transitive upstream and downstream closure: non-connected nodes and edges dim to 0.2 opacity, the selected node gets a thicker amber border, and clicking the background clears. Depth slider in the graph header (1 / 2 / 3 / ... / All) hides nodes whose distance from the root exceeds the chosen value while keeping unreachable orphans visible so the slider never silently swallows them.

Catalog — Phase 3 cockpit wiring + scenario step

MigrationCockpitPage now reads from /api/catalog/assets (the Phase 3 cache) instead of hitting /feeds every selection change. A Refresh button POSTs to /sources/:name/refresh and re-loads, showing X new, Y updated so the operator sees what moved. A new catalog_refresh scenario step type takes { sources: ['name', ...] } (or omit to refresh every registered adapter) and returns per-source + aggregate counts so scheduled jobs surface drift in the scenario run log.

Agent Chat — Dataiku-parity overhaul of /chat

"Chat & SQL" is renamed Agent Chat and rebuilt around Dataiku Answers / Agent Connect patterns.

  • UI — a dropdown agent switcher replaces the 20-pill cloud + centre wrap-grid; default is last-used via localStorage and ?agent= URL param takes priority. The hash suffix (e.g. mojl37i3) is stripped from displayed titles via a displayAgentTitle helper, with the raw title preserved in tooltips. The conversations sidebar groups by Today / Yesterday / Last 7 days / Older.
  • Smart Router (Phase 2)POST /api/chat/route runs a gpt-4o-mini classifier (temperature=0, response_format=json_object) that picks the best published agent per question. Cheap short-circuits for zero / one agent; returns null on classifier failure so the frontend falls back to default chat.
  • Phase 4 polish — conversation export to Markdown via a header button; text-file attachment up to 200 KB (txt, csv, md, json, tsv, log, xml) fenced into the prompt; per-agent Compliance Footer below the input, configurable in Agent Builder next to Welcome Message.
  • Schema — migration 2026-05-19_published_assets_project_cascade.sql purges orphan rows whose parent project is gone and recreates the project_id FK with ON DELETE CASCADE — root cause of the 20-stale-agent accumulation observed on test environments.

Agent Reviews — retry, live progress, history filters

A pile of usability fixes on the reviews surface.

  • Live progress — the runner now writes pass / fail / error counts to agent_test_runs after every persisted result (was: only at terminal). The GET endpoint returns tests_total = active_cases x executions_per_test so the page renders an X / Y counter plus a thin amber progress bar next to the Cancel button while a run is in flight. Fail and error sub-counts surface inline once non-zero.
  • Retry failedPOST /reviews/{rid}/runs/{run_id}/retry-failed reads the distinct test_ids whose overall_status was fail or error in the source run and starts a fresh run restricted to that subset. Header shows a "Retry failed (N)" button when the selected run is terminal with fail/error count above zero.
  • Per-test retry — each row in the Results matrix shows a retry button when any of its results were fail or error. Clicking spawns a new run scoped to that single test_id. POST /reviews/{rid}/runs accepts an optional {test_ids: [...]} body; None preserves the legacy "Run All" fan-out.
  • History filters — status + agent-version filter dropdowns plus a X of Y count on the History tab. The Results-tab run dropdown prefixes queued / running runs with a play marker so a non-selected in-flight run is obvious without changing selection. The polling loop tightens to 1s for the first 5s of a fresh run (catches queued-to-running quickly), then settles to 2s.

Webapp builder — closes the JSONB-only gaps

Five long-standing gaps where the builder forced raw JSON edits all get structured editors this release. Net effect: chrome, theme, filters, and card-level overrides are now fully form-driven, with the JSON textarea kept as an escape hatch where it still applies.

  • Theme palette picker — Page Info aside gets a Theme section with colour pickers for the 7 token slots (primaryColor, headerBackground, sidebarBackground, sidebarActiveBg, sidebarText, accentText), a cardColorTheme dropdown, and a logo URL field. Save handler sends theme on PUT /publish/assets/{id} so changes round-trip.
  • Chrome toggles UI — 9 webapp-wide toggles surface as switches: alert_pill, clock, share, print, theme_toggle, notifications, user_avatar, help_fab, sidebar_search. When alert_pill is on, text and severity inputs appear; when clock is on, an IANA timezone field appears.
  • Filter slots editor — webapp-wide filter slots get a structured panel: slot key rename + delete, type-specific fields (timezone, param names, mode, emptyValue), with nested presets[] / options[] still as JSON for advanced shapes. Two "+ date" / "+ branch" buttons seed sane defaults (period / branch slot, Asia/Jakarta, multi-select).
  • Row editors for presets + optionsBranchSelect.options gets a row editor with value / label / code inputs plus add and delete buttons (numbers stay typed when the value parses cleanly). DateRange.presets gets a row editor with key + label plus a kind dropdown that swaps the body — rolling_days exposes a number input, custom exposes start / end dates, MTD / QTD / YTD / prev-* need no extra fields. Both keep a JSON toggle with inline parse-error feedback.
  • Per-page filter overridesFiltersSection is reused scoped to the active page (AppPageSpec.filters), rendered above the webapp-level Filters section. Closes the gap where a Branch Focus page's single-select branch override had to be hand-edited in JSON.

Card-level editing also lands. The override drawer now exposes the three CardRefSpec fields that were schema-only with no UI — title_override, chrome.hide_header, chrome.hide_footer — plus a card duplicate button next to delete. A Swap... button next to the Dashboard and Card line opens the Add Card modal in swap mode: only dashboard_id + card_id change on the selected card; overrides, chrome flags, title override, and layout all carry through. Long-standing pain of having to delete + re-add a card when the source dashboard card got renamed is gone.

Page-level mutations:

  • Drag-to-reorder pages — dnd-kit wired on the Pages left rail; each row gets a drag handle, drop reorders both config.nav and config.pages atomically. Up/down buttons stay for keyboard and a11y users. Pages without nav (orphans) append at the tail so nothing gets dropped.
  • Move card to another page — the override drawer gets a "Move to" dropdown listing every other page; selecting one transfers the card with overrides + chrome + title to the bottom of the destination's grid and switches the active page.
  • Duplicate page — a clone button on each Pages row produces <key>_copy with (copy) appended to the title. Cloned card refs get fresh uids so move and delete on the copy stay independent of the source.
  • Cmd+S / Ctrl+S to save — intercepts the browser save shortcut to trigger the in-app save when the bundle is dirty; skipped when already saving or clean. Save button labelled (Cmd-S) so the affordance is discoverable.

Tests

Coverage backfills for two surfaces that shipped in v0.0.80 without tests: 9 integration tests pin the POST /api/webapps/{key}/pages/{page_key}/header-signals contract (404 routing, static-string fallback, SQL execution, NULL-falls-through, bind-dict filtering, exception-swallow, severity defaults). 8 integration + 4 unit tests cover Phase 3 persistence + /sources/:name/refresh upsert counts + the catalog_refresh scenario step type (per-source aggregation, KeyError capture, mid-run RuntimeError capture).