Governing Multi-Agent Systems
I was looking at an architecture diagram a client sent me last month. Seven agents, each with a job. A research agent feeding a drafting agent feeding a fact-checking agent feeding a publishing agent, with three sidecar agents (a translation step, a brand-tone check, a compliance review) tapping in along the way. The diagram was beautiful. Clean arrows, clear hand-offs, the kind of picture that gets a head nod in a steering committee.
I asked one question: when the published article is wrong, who can tell me which agent introduced the error? Silence. They had built a system that could ship a mistake in nine seconds and a system that could take nine days to find out why. That gap is the multi-agent governance problem in one sentence.
Why governing many agents is a different problem class
Governing one agent is hard. I covered the basics in when AI acts on your behalf: authorisation, audit, a hard stop, a human checkpoint on consequential moves. Apply those four to a single agent and you have something defensible.
Multi-agent systems do not just multiply that work by N. They change the shape of it. The reason is composition. A small error in agent A becomes a confident input to agent B, which becomes a downstream action by agent C. Nobody lied. Everybody did their job. The output is still wrong, and the wrongness is now baked into a CRM row, an outbound email, and a publishing queue.
Microservices live with the same chain shape, and we have managed those for fifteen years. The new wrinkle is that each agent’s behaviour is non-deterministic. A microservice that returns the same input the same way every time is auditable by replay. An agent that hallucinates a customer name on Tuesday and gets it right on Wednesday is not. The chain has stochastic links, and the failure modes compose in ways the old playbook does not catch.
The new failure modes
I have seen four patterns repeat in the multi-agent rollouts I have walked through. They do not show up in the single-agent rulebook.
Coordination errors. Two agents both think they are responsible for the same step, or neither does. The classic case: a “send the email” agent and a “schedule the follow-up” agent that both fire on the same trigger and double-book the prospect. Or worse, both assume the other will do the compliance check, and neither does.
Emergent behaviour. The agents do exactly what they were told, individually, and the system as a whole does something nobody designed. A pricing agent and a discount agent in a loop can talk each other down to zero margin in seven exchanges. Nothing in either agent’s instructions said “race to the bottom.” The behaviour emerged from the interaction.
Runaway feedback loops. Agent A’s output becomes agent B’s input becomes agent A’s input again. Without a circuit breaker, the system burns budget, fills queues, or amplifies a small error into a large one before anyone reads a log. The 2010 Flash Crash was the same pattern in algorithmic trading, and the fix then was the same fix now: a market-level halt that fires on rate-of-change, not on individual-actor misbehaviour.
Telephone-game degradation. Each agent paraphrases the previous one’s output. By the time the message reaches agent six, the customer’s “I want to defer the payment by a month” has become “the customer requested account closure.” Nobody mis-summarised egregiously. Each hand-off lost a little, and the losses compounded. This is the multi-agent failure I am seeing most often in 2026, and it is the hardest to spot because every individual step looks reasonable.
What end-to-end trace looks like in practice
The instinct from the microservices world is right, even if it has to be adapted. Every request that enters the system gets a trace ID. Every agent, every tool call, every model invocation tags its log with that ID. When you want to know what happened, you pull the trace and read the whole chain in order. The pattern is well-established for distributed systems; the OpenTelemetry spec is the lingua franca.
What is different for agents: the trace has to capture more than calls and timings. It has to capture the prompt that went in, the model output that came out, the tool the agent chose to call, and the rationale (when the agent provides one). Without that, the trace tells you what happened structurally but not why the agent made the choice it made. The why is the part you need when you are debugging a wrong answer.
A useful rule of thumb: if you cannot reconstruct an incident from a single trace ID in under ten minutes, you do not have observability, you have logs. Logs without a join key are an archive, not a debugging tool.
The governance moves that actually work
Four moves carry most of the protective weight in a multi-agent system. None of them are exotic. All of them have to be wired on purpose.
Per-agent authorisation. Each agent gets its own credentials, its own scope, its own rate limits. The research agent can read; it cannot send email. The publishing agent can write to the CMS; it cannot touch the CRM. The principle of least privilege is older than I am, and it still works. The lazy pattern (one shared service account that all agents use) is how a small bug in one agent becomes a permission to do anything, from anywhere in the chain.
End-to-end trace IDs. Covered above. The non-negotiable: a single ID that survives every hop, attached to every log line, queryable in under a minute.
System-level circuit breakers. A halt that fires on system-wide signals, not individual ones. Rate of agent calls per minute exceeded a threshold? Halt. Same trace ID has looped through agent A more than three times? Halt. Budget for a single workflow exceeded N euro of model spend? Halt. These are dumb checks, deliberately. The smart checks belong inside the agents; the circuit breakers are the limiter on the main desk, the one that does not need to understand the music to stop the fire.
A named orchestrator with audit ownership. Somewhere in the org chart, one human owns the end-to-end behaviour of the multi-agent system, not the individual agents. They sign the audit report. They get paged when the trace shows degradation. They have authority to pull any single agent out of the chain. Distributed accountability is no accountability; I made that point in the framework piece, and it is doubly true here because the temptation to spread the blame across seven agents is enormous.
Why “just observe it” is not enough
Observability is necessary, and it is not sufficient. I am seeing a lot of teams stop at the dashboard. They wire traces, they pipe metrics, they build a beautiful Grafana view, and they declare governance done. They have built a sensor system. They have not built a control system.
The difference matters. A sensor system tells you, after the fact, that something went wrong. A control system stops something from going wrong, or interrupts it while it is going wrong. Multi-agent systems move too fast for human-in-the-loop observation. By the time you read the dashboard, the chain has fired three more times. The observability has to feed automated responses (the circuit breakers above, automated kill switches on out-of-bound traces, automated rollback when a downstream system flags an anomaly) before it earns the word “governance.”
I think of it the way a chef thinks about a kitchen at service. Mise en place is not just having the ingredients ready; it is having the rules ready. If the pass calls and nobody answers in twenty seconds, the dish goes back. If the sauce breaks, the line stops and resets. The rules fire automatically because nobody has time to debate them at service. Multi-agent systems run at service speed all the time. The rules have to fire that way too.
Where this goes next
The honest position in 2026: most teams shipping multi-agent systems are six months ahead of their governance. The agents work; the guardrails are still PowerPoint. The teams pulling away from the pack are the ones treating orchestration as a discipline of its own, not a side effect of stitching agents together. Trace first, halt second, audit third. Then write the policy.
The single biggest predictor I have of a multi-agent system surviving the year is the answer to the question I opened with. When the output is wrong, can you tell me which agent introduced the error, in under ten minutes, from a single trace? If the answer is yes, you have built a system you can govern. If the answer is no, you have built a system that will, eventually, govern you.


