Skip to main content
Version: v0.0.82

v0.0.72 — Team Reviews + reviewer queue

Released: 2026-05-15. Five commits.

This release layers a team-review surface on top of the Agent Review subsystem from v0.0.70 — teams can now disagree with the LLM judge without overwriting its verdict, and a queue inbox surfaces what still needs attention.

Reviewer notes + per-trait thumbs (#19 first slice)

Schema migration add_agent_team_reviews.sql adds reviewer_notes / user_id / reviewed_at to agent_test_results plus a new agent_test_trait_votes table (UNIQUE per result/trait/user, flippable up/down).

Four endpoints under /api/agent-reviews:

  • PATCH .../results/{rid} — save reviewer notes, stamp the audit.
  • POST / DELETE .../results/{rid}/traits/{name}/vote — flip a thumb.
  • GET .../runs/{rid}/results extended with the votes aggregate + notes, all in one batched SQL (no N+1).

ResultDrawer gains a debounced-autosave notes textarea on the Details subtab and per-trait thumbs (optimistic update, rollback on error) on the Traits subtab.

Contested pill + vote counts

TraitsMatrix surfaces team disagreement at a glance:

  • Contested pill renders when reviewer thumbs lean opposite the judge's verdict (pass with majority 👎, or fail with majority 👍).
  • When not contested, raw 👍 N / 👎 N counts show inline.

Drives "did v4 regress, or does the team agree it's fine?" scans without opening each result drawer.

Reviewer queue inbox

GET /api/agent-reviews/queue — org-scoped inbox of every result still needing attention. Needs review = reviewed_at IS NULL AND (overall_status in fail/error OR has a contested trait). Contested is computed inline via a CTE so the queue stays in one query.

New page at /agent-reviews/queue with All / Failures / Contested filter pills (live counts) and a Review queue button on the Reviews list page. Row click jumps to the review detail with run + result query params — AgentReviewPage reads ?run=X&result=Y on mount, selects that run (falls back to most-recent if the id isn't in the loaded list), jumps to the Results tab, and opens ResultDrawer on the matching result. Query params clear after consumption so re-renders don't re-trigger.

Closes the queue → drawer loop — reviewers click a queue row and land directly on the result they need to triage.