Trust Boundaries, Not Prompts: The AgLabs Model for A2A Security

Trust Boundaries, Not Prompts: The AgLabs Model for A2A Security

One of the first questions we get when introducing our novel Agent-to-Agent (A2A) approach for the insurance market is simple: “What about security?”

More specifically:

  • What about prompt injection?

  • What about data exfiltration?

  • What about unauthorised access between agents?

These are not theoretical concerns. They are real, well-documented failure modes of modern Agentic AI systems. And are often misunderstood.

This post outlines our honest assessment of those risks, where they materialise for us, and why the AgLabs approach is deliberately designed to contain them.


The new threat model: from software bugs to behavioural exploits

Traditional software security is about controlling what code can do. Agentic AI systems introduce something different: they can be manipulated through what they are told.

Two ideas from recent research (well articulated by Simon Willison in his blog) are particularly useful here:

1. The “lethal trifecta”

Agentic AI systems become dangerous when they combine three elements:

  • Access to sensitive data

  • Ability to externally communicate

  • Exposure to external input (untrusted)

The lethal trifecta (as popularised by Simon Willison)

That combination creates the conditions for data exfiltration or unintended actions.

2. The “rule of two”

Recent work by Meta highlights that If a system connects any two of the above, we should assume some risk.

If it connects all three, we should assume serious consequences are possible.

This framing is helpful because it shifts the conversation from “can prompt injection happen?” (it can)

to: “what can it actually do if it does?”

In what follows, we’ll show why in the AgLabs A2A model the answer is: not much, mainly because our A2A approach constrains how these capabilities can interact.


Where most Agentic AI systems can go wrong

Most current AI Agentic applications are built like this:

Capability

Typical setup

Data access

Broad internal datasets

Actions

API calls, workflows, automation

Inputs

Emails, documents, user prompts

This creates a classic lethal trifecta: sensitive data, external action capability , untrusted input.

In that setup, a prompt injection is not just possible, it is meaningful. An attacker could:

  • Extract data they shouldn’t see

  • Trigger actions they shouldn’t control

  • Move laterally across systems


Our A2A approach changes the problem entirely

In our AgLabs Agent-to-Agent workflow, the purpose of the system is explicit: exchange information between known counterparties until a risk is decision-ready.

That changes the security model fundamentally.

Let’s walk through a typical interaction:

  1. A party (eg broker agent) sends a submission

  2. Another party (eg underwriter agent) receives it

  3. The underwriter agent asks for clarifications

  4. The broker agent responds

  5. The loop continues until the risk is ready

This is not a system trying to protect data from the counterparty.

It is a system explicitly designed to share data with that counterparty.

It’s also worth recognising a practical reality of today’s market: often market participants already overexpose information by default. For instance, in order to avoid repeated back-and-forth with multiple underwriters, brokers routinely share more data than any single counterparty strictly requires.

The key point is not that A2A results in more data being shared than today’s email-based approach (in practice, both involve ‘overexposure’).

The difference is in how that overexposure is handled: in our A2A design, data sharing becomes explicit, scoped, and auditable, rather than implicit and uncontrolled.

What is currently implicit and uncontrolled becomes programmable and governed.


So what happens if prompt injection occurs?

Let’s assume the worst case: a malicious individual or compromised agent attempts to inject prompts into the other agent during an A2A exchange.

What can it actually achieve? The answer is simple: it can 'exfiltrate' information that the other party was already willing to share, and nothing more. Why? Because of how access is defined: the boundary of exposure is defined upfront, and enforced by design.

Also, it is important to note that in our approach, A2A is not where decisions are made:

  • Agents handle coordination, clarification, and data exchange

  • Decision-making sits outside the A2A loop (within deterministic rule systems, and humans reviewing and making the final call)

This means that even if an agent’s behaviour is influenced during an interaction, It cannot directly trigger binding decisions or actions.

Which further limits the practical impact of prompt injection.


Security is defined at the boundary, not inside the model

In the AgLabs architecture, agents do not have open-ended access to internal systems.

They operate within explicit, pre-defined data scopes.

Before any interaction begins:

  • Each organisation defines what data can be shared

  • For a given submission or business line, a bounded dataset is exposed

  • The agent can only operate within that dataset

This means:

  • There is no ‘hidden’ database to exfiltrate

  • There is no broader system access to escalate into

  • There is no “surprise” data beyond what was already in scope

In other words, even if an agent is manipulated, it cannot exceed the permissions of the interaction.


The real control point: who is allowed to talk to your agent

This leads to the most important security principle our A2A approach: security is primarily about connection control, not prompt control.

Control layer

What it governs

Identity

Who the counterparty is

Permissions

What they are allowed to do

Scope

What data is exposed

Context

When and why interaction is allowed

At AgLabs, this is handled through granular, permissioned connectivity:

  • Agents only communicate with approved counterparties

  • Permissions can be scoped by:

    • Organisation

    • Team

    • Line of business

    • Active commercial relationships (e.g. TOBAs)

  • All communication is authenticated and authorised (tokens, credentials, etc.)

If an agent is not explicitly trusted: it cannot connect. Full stop.


What about unauthorised access?

The question then becomes familiar:

“What if someone gains access to an agent they shouldn’t have access to?”

At that point, we are no longer in “Agentic AI risk”.

We are in well-known security territory:

Risk

Equivalent in traditional systems

Agent impersonation

Stolen API credentials

Data access abuse

Database breach

Unauthorised calls

API misuse

Mitigations are well understood:

  • Strong authentication

  • Short-lived credentials

  • Audit logs

  • Monitoring and anomaly detection

In other words, the risk is not new. The interface is.


Why the lethal trifecta doesn’t apply (in the same way)

Element

Traditional AI systems

AgLabs A2A model

Sensitive data

Broad, often implicit access

Explicit, scoped per interaction

External communication

Open-ended (APIs, web, plugins..)

Only with authorised counterparties

Untrusted input

Open-ended (users, web, files)

Restricted to authorised agents

The issue is not whether the three elements exist - they do!

The issue is whether they can be combined in a way that allows an attacker to extract new information or trigger unintended behaviour.

In the AgLabs A2A model:

  • Inputs only come from authorised counterparties

  • The agent only has access to pre-approved, scoped data for that interaction

  • External communication is limited to those same counterparties

Therefore, there is no path to extract data beyond what was already intentionally shared within that relationship.

So even if a prompt injection occurs: it cannot expand access, escalate privileges, or reach new destinations.

Which means the “lethal trifecta” is present in theory, but constrained in practice to the point where it cannot be exploited in a meaningful way.


Designing for inevitability, not prevention

Most AI security discussions assume: “We must prevent prompt injection.”

In practice, that’s unrealistic, because prompt injection is:

  • Cheap

  • Easy

  • Almost inevitable

So the correct approach should be instead : “Assume it will happen, and design so it doesn’t matter”!

That is the core of the AgLabs approach.


A familiar analogy: email already has this problem

Today’s insurance market already runs on:

  • Email attachments

  • Free text

  • External inputs

  • Unknown formatting

This is effectively an unstructured, unauthenticated prompt injection surface — at scale

Now compare that to A2A:

Email world

A2A world

Anyone can initiate contact

Only authorised agents can connect

Identity is weakly verified

Identity is cryptographically enforced

Data sharing is discretionary

Data sharing is pre-scoped and controlled

Detection is human-led

Enforcement is system-level


What this means in practice

For any market participants:

  • You decide who your agent can talk to

  • You decide what data is shared

  • You retain full control of decisions

  • Every interaction is traceable and auditable

And most importantly: the system cannot leak what it was never able to access in the first place.


Final thoughts

The three elements of the “lethal trifecta” are present in an our A2A approach. Our agents handle data, can communicate externally, and interact with inputs beyond direct organisational control.

So the question is not whether the risk exists in theory.

It is whether the system allows that risk to be exploited in practice.

In our permissioned A2A model, it does not:

  • Data is explicitly scoped to each interaction

  • Counterparties are explicitly authorised

  • Communication is restricted to those same parties

There is no path to expand access, reach new systems, or extract information beyond what was already intentionally shared.

That is the key distinction of the AgLabs approach., which is not about preventing prompt injection. It is about designing a system where, even if it happens, nothing meaningful can go wrong.

This is what enables a secure, scalable Agent-to-Agent insurance market — by design, not by patching.