AI Security: The Adversarial Layer

13 min read

The first time I sat in a room where the AI governance committee and the security team finally talked to each other, you could feel the temperature shift. The governance side had spent six months mapping risks, drafting policies, choosing a framework. The security architect listened for twenty minutes, then asked one question: “And who is trying to break it?” Nobody had an answer. The governance program had been built as if the only threat was the model getting things wrong by accident. The security team had been left out of the room because someone, somewhere, had filed AI under “data science”, not under “attack surface“.

That gap is the whole reason this hub exists. Risk management, compliance, and data governance handle the AI working as designed and failing inside its envelope. AI security handles the AI under attack: someone deliberately pushing it outside its envelope, stealing it, poisoning it, or using it as a beachhead into the rest of your stack. Different discipline, different muscle, different people. And almost always, the discipline that arrives last.

What you will learn

  • Why AI security is a distinct layer, not a sub-bullet under risk
  • The shape of the AI attack surface in 2026
  • The five threat classes that should be on every operator’s threat model
  • How the security team and the governance team need to share a table
  • A 60-day starter plan to close the most common gaps

Why security is its own layer

In the weekly conversation I track among practitioners, security has been climbing steadily in real influence, sitting alongside Security Operations Center and Threat Hunting as adjacent concerns. The pattern is not subtle: the people doing this work are realising that AI systems are not just risky in the actuarial sense, they are targets. A model serving customer requests is a public API. A model with privileged access to internal systems is a privileged service account. A model trained on proprietary data is an intellectual property store. All three are now in scope for adversaries who, twelve months ago, were busy with phishing kits.

The governance pillars are necessary and they are not sufficient. Risk management asks “what could go wrong here”. Compliance asks “what are we obliged to do”. Data governance asks “is the input trustworthy”. None of those questions assume an opponent. Security assumes the opponent and works backwards. That is a different muscle, and almost always a different reporting line. If your AI program does not have a named security counterpart from week one, you are building the bridge and forgetting the load-bearing wall.

The other thing I keep walking into: governance programs staffed entirely from legal, compliance, and data science miss the threats that have a CVE number attached. The security team has been thinking about adversaries for twenty years. They have the threat-modeling vocabulary, the incident-response playbooks, the SOC integration. Bolt them in late and you spend the first incident discovering they were never wired into the alert path. Bolt them in at week one and you get a program that survives a real attack instead of a tidy paper one.

The shape of the AI attack surface

Outputs and actions

Generated text

Tool calls + writes

Downstream system effects

The AI system itself

Model weights
training data
fine-tunes

RAG store + indexes

Agent + tools
authorization scope

Inputs you do not control

User prompts

Retrieved documents

Tool outputs

Upstream API responses

The picture is deceptively simple, and the lesson is in what the arrows hide. Every arrow is an attack vector. Inputs you do not control include user prompts (obvious) but also retrieved documents, tool outputs, and upstream API responses (much less obvious). The core hides the model itself, the retrieval store, and the agent scope. The outputs include text a human reads and actions the system takes on downstream systems before a human can review them.

A traditional web application has a handful of attack surfaces and decades of defensive patterns. An AI application has all the traditional ones plus a new layer where natural language becomes executable, where the model is influenced by anything in its context window, and where the agent acts before review. The defensive patterns for that new layer are still being written. Treating AI security as “regular application security plus a content filter” is the most common mistake I see, and the most expensive.

Get the AI & data signal, daily.

335k+ subscribers read this every morning. One email, both newsletters. Unsubscribe anytime.

The five threat classes that should be on every threat model

I keep coming back to the same five buckets when I help a team draft their first AI threat model. They map roughly onto the OWASP Top 10 for LLM Applications and the MITRE ATLAS matrix, but the five-bucket sort is what fits on a whiteboard.

Prompt injection and adversarial inputs. Someone (or some retrieved document) tells the model to ignore its instructions, exfiltrate its system prompt, or call a tool it should not have called. Direct prompt injection comes from the user. Indirect prompt injection (the more dangerous form) comes from content the model retrieves: a webpage, a document, an email. The model cannot tell instructions from data because, at the token level, there is no difference. Deep dive: prompt injection and adversarial inputs.

Model supply chain compromise. The weights you downloaded are not the weights the original author published. The fine-tuning dataset was poisoned. The dependency you pinned was hijacked at the registry. The base model picked up a backdoor at training time that activates on a specific trigger phrase. This category looks exactly like the software supply chain story we have been living for five years, with one twist: a poisoned model is harder to audit than poisoned code, because the malicious behaviour is encoded in billions of floats. Deep dive: securing the AI supply chain.

Training-data poisoning and model extraction. Two opposite-direction problems that share a table. Poisoning is the adversary putting things into your training set; extraction is the adversary pulling things out of your trained model. Both end up at the same place: a model whose behaviour is shaped by an interest other than yours, leaking information you did not intend to expose. Repeated queries to a public model can rebuild proprietary training data; targeted queries can rebuild personal data on individuals. The defensive controls are different from what your existing data team runs.

Data security and privacy for AI. This bucket is the one your security team will recognise instantly: secrets in prompts, PII in logs, training data with hidden classified content, RAG indexes that expose documents the original ACLs would have blocked, prompt logs that become a new sensitive data store nobody planned to manage. AI did not invent these problems. AI multiplied them, because every prompt is potentially a new copy of sensitive data in a new location with a new retention question. Deep dive: data security and privacy for AI systems.

Operational and runtime attacks. Denial of wallet (someone makes you spend a fortune on inference), denial of service (someone fills your context window with junk), excessive agency (you gave the agent too many tool permissions and now an attacker is using them), and the SOC blind spot (your alerting is wired for traditional traffic and never sees the AI-layer attacks at all). The fix is not technical alone. It is wiring AI traffic into the same monitoring fabric as everything else. Deep dive: AI in the SOC.

The full taxonomy of how all five hit a real system is the AI attack surface article.

Give security a seat at the governance table on day one

The single behavioural change that has the highest payoff in my client work is also the cheapest: when you stand up the AI governance committee, the security architect (or CISO delegate) is in the founding membership, not invited later. Not consulted. Member.

The cost of doing this early is one extra chair and one extra recurring slot. The cost of doing it after the first incident is a forensic investigation, a regulator letter, and a year of trust-rebuilding with a security team that knows it was excluded from the decision that hurt them. I have watched both sides of that. The first is awkward for a quarter and pays back for years. The second poisons the program.

There is a cooking version of this. Mise en place: get everything on the bench before the heat goes on. The governance program is the heat. The security team is on the mise-en-place list. Bring them at prep, not at plate-up.

What that looks like in practice is small and concrete. The AI inventory is jointly owned by governance and security. The high-tier systems from the risk-tiering pass automatically become in-scope for security review. The SOC has a documented detection-and-response path for AI-layer incidents (prompt injection, exfiltration via model output, agent misuse), with the same severity ladder as everything else. The procurement clause for vendor AI features (see the Shadow AI playbook) carries a security review trigger, not just a compliance one.

The handoff to Agentic AI governance is also security territory. An agent with broad authorisation is a privileged service account; treat it like one. Rotate its credentials, scope its permissions, monitor its actions, and assume it will be targeted.

A 60-day starter plan that produces evidence

You do not need a year. Two months is enough to close the largest gaps and put the rest on a roadmap.

Days 1-20: Threat model the top five systems. Take the five highest-tier AI systems from your governance inventory. For each, walk the five threat classes above with a security engineer and an AI engineer in the same room. Document what is plausible, what is in scope, what is already mitigated, and what is not. The output is a one-page threat model per system, signed by both engineers.

Days 21-40: Wire detection and response. For each threat class on each of the five systems, name the detection (what would tell us this is happening) and the response (what we do when it does). Most of these are extensions to controls the SOC already runs (log aggregation, anomaly detection, on-call rotation), with new sources (prompt logs, retrieval logs, agent action logs) and new playbooks. Run one tabletop exercise: prompt injection on a customer-facing assistant, exfiltrating its system prompt. See where the playbook breaks.

Days 41-60: Close the supply chain and the agent scope. Inventory every model in production with provenance (where it came from, who built it, what fine-tunes are layered on it). Verify checksums where possible, document where not. Audit every agent with a tool-use scope: what can it do, what can it write to, what would the worst case be. Reduce the scope where it is wider than the job requires. The day-60 deliverable is a small, true map of what your AI systems can be attacked through and what would happen if they were.

The reverse order (policy first, technical work later) fails in security the same way it fails in risk management. The policy ends up describing a defence that does not exist. The honest order produces a defence first, then writes the policy that describes it.

Yves Mulkers

Yves Mulkers is a data and AI strategist, founder of 7wData, and a top-ranked voice on data and analytics. He has spent fifteen years on the unglamorous, load-bearing parts of data work: governance, architecture, and quality. He writes about what he sees moving in the field before it reaches the headlines.