Trust, governance, and the unglamorous half of agent work

The agent era did something inconvenient to the security industry: it turned non-human actors into first-class users of enterprise systems. Identity, access, audit, and accountability were all built around the assumption that there's a person on the other end. That assumption is no longer reliable, and the playbooks for fixing it are still being written in real time.

This is the brief about the unglamorous work that determines whether an agent ever ships past pilot.

The identity problem in one sentence

An agent is a process that holds credentials and acts on behalf of someone, often across multiple systems, sometimes for hours, occasionally without a human watching.

Pick that sentence apart and every word is a security problem.

Credentials. Whose? The user's? The agent's own? Something in between? Different models have different blast radii.
On behalf of. Delegation in classical IAM is shallow. Delegation across a long-running multi-step agent task is deep. Most identity providers are catching up.
Multiple systems. Each integration point is a place where the agent's authority needs to be evaluated.
Hours. Token lifetimes were designed around interactive sessions. Long-running agents stress them.
Without a human watching. The audit log is now the primary control surface, not the backstop.

Frameworks worth borrowing from

A few that have hardened in the last six months:

NIST AI Agent Standards Initiative (launched February 2026) put agent identity and security at the top of its priority list. The headline argument is that every agent should have a unique cryptographic identity bound to an accountable owner. Not an API key in an env var. An identity, with a lifecycle, that you can revoke and audit like a service principal.

Cloud Security Alliance: Agentic Trust Framework applies zero-trust principles to agent populations: assume breach, verify every action, scope every permission to the minimum needed, log everything in a tamper-evident way. The frame is familiar. The application is new.

McKinsey: State of AI Trust 2026 is worth reading for the empirical findings: only about a third of organizations report governance maturity at level 3+. Translation: most teams have shipped agents but haven't built the operational scaffolding to run them at scale.

You don't have to adopt any of these wholesale. You should know they exist so when leadership asks "what's our framework" you have something to point at.

Bounded autonomy as the design pattern

The single most important architectural pattern in 2026 agent work is bounded autonomy: the practice of giving an agent a defined sandbox, a defined budget, and a defined commit boundary, and explicitly not trusting it past those boundaries.

In practice that looks like:

Sandboxed tools. The agent reads from production but writes to staging. A human reviews the staging diff before it promotes.
Budget caps. A loop exits when it has used N tokens, made M tool calls, or run for T seconds. Without these the failure mode is "infinite agent."
Commit gates. Anything irreversible (sending an email, paying an invoice, merging a PR, deleting a record) requires a human approve step. The agent drafts; the human signs.
Visible plans. If the agent is doing magentic-style planning, the plan is human-readable and interruptable mid-execution.

The teams that ship agents are the teams that treat bounded autonomy as the default and earn looser bounds gradually through evidence, not arguments.

Decision provenance

The bar for what counts as "good logging" has shifted. Capturing the agent's final answer is not enough. You need:

Inputs considered: what context was loaded, from where.
Tools available: what the agent could have called, not just what it did call.
Alternatives evaluated: the choices the model rejected and (where the model surfaces it) why.
Confidence and uncertainty: model logprobs or self-reported confidence, where available.
Policy decisions: every guardrail check, allow or deny.
Outputs and side effects: the result and what changed in the world.

This is what the literature is calling decision provenance. The reason it matters is mundane: when an agent does something wrong, you need enough information to diagnose why, not just what. The teams that get this right early avoid the 2026 version of "the AI did it," a complete loss of organizational accountability.

Inputs are now an attack surface

Anything entering the agent's context window is potential prompt injection. Anything coming back from a tool is potentially adversarial output dressed as data. The defensive moves are:

Validate inputs at boundaries. Treat user-provided text, web content, and tool output as untrusted by default. PII and sensitive content detection should run before the model sees the data, not after.
Constrain outputs. Schema-enforced outputs make injection-driven payloads structurally impossible to express.
Quarantine new sources. A new MCP server, a new web search, a new document type: all should go through a review before being added to the agent's available context.

This is not paranoia. It is the same threat model you'd apply to any system that consumes external input, applied consistently to a system that also makes decisions.

In your M365 environment

The good news: most of what you need exists already.

Identity. Every Cowork or Copilot Studio agent runs as either the user or as a workload identity in Entra. Treat workload identities like service principals: name, owner, lifecycle, periodic review. Not as fire-and-forget API keys.
Sensitivity labels and Conditional Access. Already covered in the Work IQ brief. These become primary controls for non-human actors. Audit your label coverage; close the "Everyone except external" sites.
Purview and the M365 audit log. This is where decision provenance lives. Enable agent-specific audit categories. Build dashboards that surface unusual patterns: agents reading vastly more documents than humans, agents acting outside business hours, agents with sustained tool-call rates that look like loops.
Approval flows. For any agent that takes irreversible action, route it through Power Automate with a human approval before commit. This is unglamorous and it is the difference between we have agents and we trust our agents.

The boring half of agent work is also the half that decides whether an organization scales agents past pilot. The teams that win in 2026 are not the teams with the cleverest prompts. They're the teams whose audit logs you'd be comfortable showing a regulator.

Sources: CSA: Agentic Trust Framework · McKinsey: State of AI Trust 2026 · McKinsey: Trust in the age of agents · AI agent governance: practical guide