The evidence graph, watching the agent show its receipts

terraform module ingest lock 0 1 apply drift graph { } 9fa3 plan iam helm => b4f aws_s3 allow k8s == 2f9 vpc deny state + ~iam helm => b4f aws_s3 allow k8s == 2f9 vpc deny state + ~ terraform module ingest lock 0 1 apply drift graph { } 9fa3 planstate + ~ terraform module ingest lock 0 1 apply drift graph { } 9fa3 plan iam helm => b4f aws_s3 allow k8s == 2f9 vpc deny9fa3 plan iam helm => b4f aws_s3 allow k8s == 2f9 vpc deny state + ~ terraform module ingest lock 0 1 apply drift graph { }vpc deny state + ~ terraform module ingest lock 0 1 apply drift graph { } 9fa3 plan iam helm => b4f aws_s3 allow k8s == 2f9

Under each agent answer, a live crawl plays the agent reading the real context graph, then collapses to an inspectable graph cross-linked to the prose.

trust
provenance
evidence graph

TL;DR

The deepest trust problem is not whether the agent will do the wrong thing, it is whether you can believe what it just told you.

Under each agent answer, a live crawl plays the agent reading the real context graph, then collapses to an inspectable graph whose nodes are cross-linked to the answer's prose via interactive citations.

The problem

Fluent answers are not the same as correct answers, and acting on a confident but wrong claim in infra is expensive.

Users had no way to interrogate why the agent said what it said, so they trusted blindly or not at all.

Process

I mapped how engineers trust a colleague's claim: they ask how you know, and expect a chain back to a source. That is the whole idea here, trust is provenance made watchable and traversable.

A static citation list was not enough, because infra reasoning is relational and temporal, so the evidence should show the agent moving through it rather than list its sources.

I also prototyped and rejected a trust ledger sidebar; it tested as clutter that nobody opened twice, which is how I learned that provenance had to live inside the answer, not beside it.

The solution

A live crawl (searching the context graph) where nodes light up as they are read and discarded ones fall away, collapsing to an inspectable React-Flow graph.

The live crawl: the agent reading the context graph while it thinks

Node drill-down to the exact provenance record.

Node drill-down: the exact provenance record behind a claim

Clickable citations that light the matching node.
The whole answer is built for trust: streamed reasoning, typed value badges (red for a cost anomaly, green for a saving), and a guarded rollback that asks before acting.
A sanitizer plus a leak-guard test ensure no raw query ever reaches the browser, so designing the trust feature also meant designing what the user is not allowed to see.

Impact

Shipped internally, behind a feature flag, and still awaiting review, so there are no adoption numbers yet.

In every demo it answers the single hardest objection: whether the answer can be trusted rather than taken on faith.

Reflection

Reasoning-state explains intent, the approval gate controls action, the evidence graph explains grounding.

Grounding is where infra trust actually lives.