AI Incident Response: When the Model Fails in Production

10 min read

Most organisations I walk into have two incident runbooks on the shelf. One is the security incident runbook: somebody broke in, here is who we call, here is how we contain it. The other is the data incident runbook: a pipeline broke or data leaked, here is the playbook. Both are fine. Both miss the third case that is now showing up every week. The model failed, and nobody broke in.

That is the gap. An AI incident is not a breach and not a pipeline outage; it is a system behaving inside its parameters, producing outputs that are wrong, biased, leaking, or acting on the world in ways nobody authorised. The standard runbook does not have a slot for “roll the model back two versions while preserving the prompt logs as evidence.” I have watched good SOC teams freeze for an afternoon because their muscle memory had no move for it.

This piece is the third runbook. Detect, contain, investigate, recover, learn, written for the AI-specific moves a security operations centre will not find in NIST SP 800-61 on its own.

What triggers an AI incident

A breach announces itself with an alert. An AI incident usually does not. The hardest part of this work is agreeing in advance what you will treat as an incident, because most of the signal arrives as something quieter than a SIEM alarm.

Four triggers are worth wiring up before you need them.

Behavioural drift past a threshold. Output quality has slipped. The classifier that ran at one accuracy band is now running at another. This is monitoring’s job, but only if you decided in advance what “out of band” means. Without a threshold, drift looks like noise until it looks like a lawsuit.

A user-reported wrong action. A single complaint is noise. A pattern is the most reliable early signal you will get, because real people see failure modes synthetic tests do not. Triage the inbox as if it were a sensor.

A regulator or customer asking a question you cannot answer in an hour. “Why did your system decline this application?” If reconstructing the answer takes a week, you are already in an incident; you just have not declared it yet.

An autonomous agent that acted outside its lane. The agent moved money, sent an email, updated a record nobody asked it to touch. This one will become the dominant trigger over the next two years. Treat the first occurrence as production-down.

The art is calibrating the threshold so you are not declaring an incident every Tuesday and not missing the one that matters. Most teams start too loose, get noise fatigue, and end up missing the real thing. Start strict and loosen with evidence.

Contain: rollback, kill switch, evidence-preserving downtime

Containment in AI is different from containment in classic security. There is rarely an attacker to kick out. The system itself is the failure surface, and your job is to stop the bleeding without destroying the evidence you will need at 3pm on the next quiet day.

High blast radiusMedium / reversible

Trigger fires
declare incident

Snapshot first:
model version, prompts,
inputs, outputs, logs

Severity

Kill switch:
route to human or
safe fallback

Rollback to last
known-good model
version

Quarantine training data
and recent fine-tunes

Investigation:
audit log replay
prompt forensics

Three moves matter, in this order. Snapshot first, then act. A model rollback that overwrites the failing weights without preserving them is a forensic disaster; you cannot investigate what you cannot reproduce. The first thirty seconds of containment are about preservation: pin the model version, freeze the prompt logs, capture the inputs and outputs in the window of interest, dump the routing config. Only then do you act.

Then rollback or kill. Rollback to the last known-good model version is the cheap move when the failure looks like it was introduced by a recent change. The kill switch (route to a human, route to a deterministic fallback, refuse the request) is the right move when the blast radius is large and you do not yet know what changed. Both should exist before the incident. Wiring a kill switch under pressure is how you create a second incident.

Quarantine the training data and any recent fine-tunes. If the failure traces to a data pipeline (poisoned input, mislabelled batch, drift in upstream sources), the training set is now evidence and a potential reinfection vector at the same time. Lock it. Do not let the next retrain reach for it until it has been audited.

The deep dive on the controls that should have been in place upstream sits in Model Risk Management. The deep dive on how the SOC plugs all of this into its existing workflow sits in AI in the SOC.

Get the AI & data signal, daily.

335k+ subscribers read this every morning. One email, both newsletters. Unsubscribe anytime.

Investigate: audit log replay and prompt forensics

This is where AI incidents diverge most sharply from the classic playbook. There is no firewall log to grep, no malware sample to reverse. The evidence is in the prompt history, the model version, the retrieval context, and the input distribution. If you did not log it, you cannot investigate it. Full stop.

A workable forensic replay needs four things on tape: the exact model version (weights hash, not just a name), the full prompt as sent (system, user, retrieved context, tool outputs), the model’s response, and the downstream action taken. With those four, you can reconstruct any single failure. Without one of them, you are guessing.

The technique most teams underuse is replay against a parallel model. Take the failing inputs, run them through the rolled-back version and the suspect version side by side. The diff is your investigation. If the rollback handles them correctly and the suspect version fails them, the regression is in the model change. If both fail them, the regression is upstream (data, prompts, retrieval).

Prompt forensics is the new specialty. A model can fail because the prompt template was edited in a way nobody noticed, because the retrieval index returned different documents, because a tool the agent called started returning different shapes. Treat every link in the chain as a potential mutator. Diff them all against the last known-good snapshot.

Recover: retrain, retire, or restore with conditions

Three outcomes, and the choice is not always obvious in the moment.

Restore with conditions is the most common. The rolled-back version goes back into production with tighter monitoring, a narrower input filter, or a smaller user cohort. You buy time to fix the suspect version properly. Most incidents end here.

Retrain is the right call when the failure traces to a data problem. Quarantine the bad data, rebuild the training set, retrain, revalidate. This is expensive in time but cheap in long-term risk. Do not rush it; a retrain pushed live under pressure produces the next incident.

Retire is the call nobody likes making. Sometimes the model is in a failure mode the team cannot reliably bound, and the honest move is to take it out of service and replace the capability with a human, a rules engine, or a different model entirely. The hardest part of retirement is admitting it; the second-hardest is communicating it without panic.

Learn: a post-incident review that produces change

The post-incident review is where most of the long-term value sits and where most teams under-invest. Three rules I borrow from operations practice and stretch for AI.

Blameless, but specific. No name-and-shame; full detail on what happened and when. “The on-call missed the alert” is useless. “The alert routing sent severity-2 model-drift alarms to a Slack channel nobody owned overnight” is actionable.

One owner per action item, with a date. A list of “we should consider” items is a list of nothing. Each finding becomes a single owner, a single deadline, and a tracker entry that closes loudly.

Feed it back into the runbook. The third runbook is a living document. Every incident teaches it something. If the review does not produce at least one edit to the runbook, the review missed something.

The frameworks worth keeping open while you write yours are NIST SP 800-61, ISO/IEC 27035-1:2023, and the SANS Incident Handler’s Handbook. Borrow their structure. Add the AI-specific moves above. Do not invent the whole thing from scratch; you will miss the boring parts that matter most under pressure.

Yves Mulkers

Yves Mulkers is a data and AI strategist, founder of 7wData, and a top-ranked voice on data and analytics. He has spent fifteen years on the unglamorous, load-bearing parts of data work: governance, architecture, and quality. He writes about what he sees moving in the field before it reaches the headlines.