v0.0.81 — Catalog Phase 4 live + cross-tool lineage, Agent Chat, Webapp builder polish
Released: 2026-05-20. 38 commits.
A heavy catalog release. Phase 4 closes — every file-inventory adapter (Dataiku, Kafka, Informatica, SSIS, Oracle GoldenGate) now has a live extractor — and the cockpit gets cross-tool lineage stitching, column-level edges via sqlglot, and a real SVG graph view. Separately, the chat surface is rebuilt around Dataiku Answers patterns, Agent Reviews gains retry + live progress, and the Webapp builder closes the last JSONB-only gaps on chrome, theme, filters, and card-level overrides.
Catalog — Phase 4 live extractors complete
The four file-inventory adapters introduced in v0.0.80 all grow a live extraction path. Same factory-dispatch pattern across each — the registry name stays stable (dataiku, kafka, informatica, ssis), and live mode activates the moment its mode-specific config keys are set. File mode stays the default so v0.0.80 tenants need no config change.
- Dataiku DSS — walks
/public/api/projects/, enumerates datasets and recipes, and parses recipeinputs[]/outputs[]into lineage edges. HTTP Basic withapi_keyas username (DSS convention). Per-project failures are caught and logged so a half-broken DSS does not blank the catalog. - Kafka Schema Registry — walks
/subjects+/subjects/{s}/versions/latest, folds<topic>-value/<topic>-keysubjects under a shared topic property, and surfaces Avrodocas description. Optional HTTP Basic for Confluent Cloud. - Kafka Connect — when
connect_urlis configured, the same Kafka adapter walks/connectorsand emits each as aconnectorasset taggedsinkorsource.get_lineagesynthesises connector to topic edges: sink connectors gettopic_fqn -> connector_fqn, source connectors get the reverse. Debezium / JDBC prefix-only sources surface a syntheticprefix*topic so the graph never dangles. - Informatica IICS / IDMC — the IICS two-step login protocol (
POST /ma/api/v2/user/loginthenGET {serverUrl}/api/v2/mapping) plus mtTask enumeration. Mappings surface asmappingassets, tasks astaskcarryingmapping_idso the cockpit can chain them. PowerCenter (on-prem, pmrep-based) stays in file mode — mixed shops configure two informatica instances. - SSIS — has no REST API, so live mode parses a directory of checked-out
.dtsxXML files. Each package becomes one asset; nestedPipelineExecutables surface as dataflow tags; OLE-DBOpenRowsetvalues get captured as referenced tables. Project subfolders carve into FQN components so duplicate package names across projects do not collide. - Oracle GoldenGate
.prmconfig walker — replaces the stubOggAdapterwith a real parameter-file parser. Each.prmsurfaces as oneextract/replicat/pumpasset;MAP/TABLE/ trail-file statements become REPLICATE edges, including multi-lineMAPs. When anEXTRACTwrites a trail any other group declares as input, an extract-to-replicat edge stitches the full CDC chain so the cockpit shows the entire pipeline as one path.
OGG also picks up vendor-prefix FQN stitching: with source_vendor / target_vendor config keys plus a parsed SOURCEDB / TARGETDB, edges emit FQNs like oracle.<db>.<schema>.<tbl> instead of the OGG-local namespace. The missing wedge for cross-tool lineage — an Oracle catalog adapter pointed at the same DB now publishes identical FQNs, so the walk endpoint stitches the hop with zero glue.
Catalog — S3 adapter
New S3Adapter walks a bucket prefix via boto3, groups data files (.parquet / .csv / .json / .jsonl / .avro / .orc / .tsv) by their parent prefix, and collapses Hive-style partition segments (dt=2026-05-01/region=us) so a partitioned dataset becomes one CatalogAsset rather than thousands. FQN shape s3.<bucket>.<schema>.<table> matches every other adapter, so the walk endpoint pivots on the s3 prefix without special casing. get_lineage() returns [] — S3 is a storage layer; upstream tools (Dataiku S3 recipe, dbt-spark, OGG bigdata adapter) own the edges that touch s3.* FQNs.
Catalog — column-level lineage via sqlglot
New routers/catalog/column_lineage.py extracts a column map ({out_col: [(upstream_fqn, upstream_col), ...]}) from a SQL string plus an upstream resolver. Pure SQL parsing, no adapter coupling — any adapter carrying SQL text can call it.
Wired into three adapters this release:
- dbt — each parent edge now carries the slice of
column_mappointing back to that specific parent. Table-level edges still emit when SQL parsing fails or the model has only un-rendered Jinja. - Dataiku —
DataikuLiveAdapter.get_lineagefetches each SQL recipe's body and runs it through the extractor. Recipe-type detection: anything whose type containssql(sql_query,sql_script, sync-with-SQL-engine). Visual prep / Python / R recipes skip — their payloads are not SQL. - Informatica —
InformaticaLiveAdapter.get_lineagelooks up the mapping by name, fetches/api/v2/mapping/{id}/details, cross-joins sources by targets, and when a source carriescustomSqlQueryorsqlOverrideruns it through the same extractor.
Coverage: bare SELECT, aliased / function-wrapped columns, expression columns, multi-table joins with alias resolution, single-level CTEs, and UNION / INTERSECT / EXCEPT per-leg merge. Failure modes are explicit — unparseable SQL returns an empty dict with a warning; unresolvable upstreams emit an out-column with an empty source list so the cockpit can flag ambiguity instead of silently dropping it.
Catalog — unified cross-adapter lineage walk
New endpoint GET /api/catalog/lineage/walk?root_fqn=... stitches edges across every configured adapter instance. Pivots on the FQN's leading tool prefix (dbt., dataiku., kafka., ...) to figure out which adapter resolves each hop, then recurses through cross-tool endpoints until max_depth or convergence. Per-adapter get_lineage failures are logged and skipped so a single half-broken upstream does not blank the walk. The response carries truncated=true when max_depth halted traversal so the cockpit can offer a "go deeper" affordance instead of silently under-reporting.
A tools= filter scopes the walk to a subset (?tools=dbt,ogg for a clean two-system slice). Endpoints reached via a discovered edge still surface as nodes so the cockpit can colour the cross-tool perimeter, but recursion is skipped into filtered-out adapters. The root FQN's own tool is always allowed regardless of the filter.
Catalog — combined-lineage drawer in the Migration Cockpit
Clicking Lineage on any feed row in the Migration Cockpit now opens a side drawer that calls /lineage/walk and renders the combined graph. Two views:
- Sources involved — nodes grouped by tool prefix with one coloured badge per adapter family (dbt / dataiku / kafka / ssis / informatica / ogg / excel). Cross-tool stitching is visible at a glance.
- Edges — every upstream-to-downstream hop with the recipe or model that produced it;
column_mapexpands inline when sqlglot could parse the source SQL (so the cockpit showscustomer_name <- nameunder a Dataiku-to-dbt edge).
Controls: depth dropdown (1 / 2 / 3 / 5 / 8, default 3) and a "walk truncated" chip when max_depth halted traversal.
A Graph / List toggle adds an SVG graph view: nodes laid out in BFS-depth columns from the root, edges as cubic-bezier curves with arrowheads. Negative columns are upstream hops (left); positive are downstream (right). Node colour by tool prefix; the root carries a black border. Edge style distinguishes column-level (solid amber) from table-level-only (dashed grey) so the user knows which hops sqlglot could parse. Layout is deterministic — same FQNs in, same picture out.
Click-to-highlight isolates a node's transitive upstream and downstream closure: non-connected nodes and edges dim to 0.2 opacity, the selected node gets a thicker amber border, and clicking the background clears. Depth slider in the graph header (1 / 2 / 3 / ... / All) hides nodes whose distance from the root exceeds the chosen value while keeping unreachable orphans visible so the slider never silently swallows them.
Catalog — Phase 3 cockpit wiring + scenario step
MigrationCockpitPage now reads from /api/catalog/assets (the Phase 3 cache) instead of hitting /feeds every selection change. A Refresh button POSTs to /sources/:name/refresh and re-loads, showing X new, Y updated so the operator sees what moved. A new catalog_refresh scenario step type takes { sources: ['name', ...] } (or omit to refresh every registered adapter) and returns per-source + aggregate counts so scheduled jobs surface drift in the scenario run log.
Agent Chat — Dataiku-parity overhaul of /chat
"Chat & SQL" is renamed Agent Chat and rebuilt around Dataiku Answers / Agent Connect patterns.
- UI — a dropdown agent switcher replaces the 20-pill cloud + centre wrap-grid; default is last-used via localStorage and
?agent=URL param takes priority. The hash suffix (e.g.mojl37i3) is stripped from displayed titles via adisplayAgentTitlehelper, with the raw title preserved in tooltips. The conversations sidebar groups by Today / Yesterday / Last 7 days / Older. - Smart Router (Phase 2) —
POST /api/chat/routeruns agpt-4o-miniclassifier (temperature=0,response_format=json_object) that picks the best published agent per question. Cheap short-circuits for zero / one agent; returnsnullon classifier failure so the frontend falls back to default chat. - Phase 4 polish — conversation export to Markdown via a header button; text-file attachment up to 200 KB (
txt,csv,md,json,tsv,log,xml) fenced into the prompt; per-agent Compliance Footer below the input, configurable in Agent Builder next to Welcome Message. - Schema — migration
2026-05-19_published_assets_project_cascade.sqlpurges orphan rows whose parent project is gone and recreates theproject_idFK withON DELETE CASCADE— root cause of the 20-stale-agent accumulation observed on test environments.
Agent Reviews — retry, live progress, history filters
A pile of usability fixes on the reviews surface.
- Live progress — the runner now writes pass / fail / error counts to
agent_test_runsafter every persisted result (was: only at terminal). The GET endpoint returnstests_total = active_cases x executions_per_testso the page renders anX / Ycounter plus a thin amber progress bar next to the Cancel button while a run is in flight. Fail and error sub-counts surface inline once non-zero. - Retry failed —
POST /reviews/{rid}/runs/{run_id}/retry-failedreads the distincttest_idswhoseoverall_statuswas fail or error in the source run and starts a fresh run restricted to that subset. Header shows a "Retry failed (N)" button when the selected run is terminal with fail/error count above zero. - Per-test retry — each row in the Results matrix shows a retry button when any of its results were fail or error. Clicking spawns a new run scoped to that single
test_id.POST /reviews/{rid}/runsaccepts an optional{test_ids: [...]}body;Nonepreserves the legacy "Run All" fan-out. - History filters — status + agent-version filter dropdowns plus a
X of Ycount on the History tab. The Results-tab run dropdown prefixes queued / running runs with a play marker so a non-selected in-flight run is obvious without changing selection. The polling loop tightens to 1s for the first 5s of a fresh run (catches queued-to-running quickly), then settles to 2s.
Webapp builder — closes the JSONB-only gaps
Five long-standing gaps where the builder forced raw JSON edits all get structured editors this release. Net effect: chrome, theme, filters, and card-level overrides are now fully form-driven, with the JSON textarea kept as an escape hatch where it still applies.
- Theme palette picker — Page Info aside gets a Theme section with colour pickers for the 7 token slots (
primaryColor,headerBackground,sidebarBackground,sidebarActiveBg,sidebarText,accentText), acardColorThemedropdown, and a logo URL field. Save handler sendsthemeonPUT /publish/assets/{id}so changes round-trip. - Chrome toggles UI — 9 webapp-wide toggles surface as switches:
alert_pill,clock,share,print,theme_toggle,notifications,user_avatar,help_fab,sidebar_search. Whenalert_pillis on, text and severity inputs appear; whenclockis on, an IANA timezone field appears. - Filter slots editor — webapp-wide filter slots get a structured panel: slot key rename + delete, type-specific fields (timezone, param names, mode,
emptyValue), with nestedpresets[]/options[]still as JSON for advanced shapes. Two "+ date" / "+ branch" buttons seed sane defaults (period / branch slot,Asia/Jakarta, multi-select). - Row editors for presets + options —
BranchSelect.optionsgets a row editor with value / label / code inputs plus add and delete buttons (numbers stay typed when the value parses cleanly).DateRange.presetsgets a row editor with key + label plus a kind dropdown that swaps the body —rolling_daysexposes a number input,customexposes start / end dates, MTD / QTD / YTD / prev-* need no extra fields. Both keep a JSON toggle with inline parse-error feedback. - Per-page filter overrides —
FiltersSectionis reused scoped to the active page (AppPageSpec.filters), rendered above the webapp-level Filters section. Closes the gap where a Branch Focus page's single-select branch override had to be hand-edited in JSON.
Card-level editing also lands. The override drawer now exposes the three CardRefSpec fields that were schema-only with no UI — title_override, chrome.hide_header, chrome.hide_footer — plus a card duplicate button next to delete. A Swap... button next to the Dashboard and Card line opens the Add Card modal in swap mode: only dashboard_id + card_id change on the selected card; overrides, chrome flags, title override, and layout all carry through. Long-standing pain of having to delete + re-add a card when the source dashboard card got renamed is gone.
Page-level mutations:
- Drag-to-reorder pages — dnd-kit wired on the Pages left rail; each row gets a drag handle, drop reorders both
config.navandconfig.pagesatomically. Up/down buttons stay for keyboard and a11y users. Pages without nav (orphans) append at the tail so nothing gets dropped. - Move card to another page — the override drawer gets a "Move to" dropdown listing every other page; selecting one transfers the card with overrides + chrome + title to the bottom of the destination's grid and switches the active page.
- Duplicate page — a clone button on each Pages row produces
<key>_copywith(copy)appended to the title. Cloned card refs get fresh uids so move and delete on the copy stay independent of the source. - Cmd+S / Ctrl+S to save — intercepts the browser save shortcut to trigger the in-app save when the bundle is dirty; skipped when already saving or clean. Save button labelled
(Cmd-S)so the affordance is discoverable.
Tests
Coverage backfills for two surfaces that shipped in v0.0.80 without tests: 9 integration tests pin the POST /api/webapps/{key}/pages/{page_key}/header-signals contract (404 routing, static-string fallback, SQL execution, NULL-falls-through, bind-dict filtering, exception-swallow, severity defaults). 8 integration + 4 unit tests cover Phase 3 persistence + /sources/:name/refresh upsert counts + the catalog_refresh scenario step type (per-source aggregation, KeyError capture, mid-run RuntimeError capture).