MASO Control Domain: Data Protection¶

Part of the MASO Framework · Control Specifications Covers: LLM02 (Sensitive Info Disclosure) · LLM04 (Data/Model Poisoning) · ASI06 (Memory & Context Poisoning) · LLM08 (Vector/Embedding Weaknesses) Also covers: DR-02 (RAG Poisoning/Corpus Drift) · DR-03 (Derived Data Elevation)

Principle¶

Data flows between agents must be classified, controlled, and monitored. An agent's access to data is determined by its own classification level, not by the classification of the agent that sent the data. Shared knowledge bases are integrity-checked. Persistent memory is isolated per agent and has a finite lifespan.

In a multi-agent system, every inter-agent message is a data transfer across a trust boundary. The message bus is not just a communication channel — it is the primary data loss prevention enforcement point.

Why This Matters in Multi-Agent Systems¶

Implicit data flows through delegation. When Agent A asks Agent B to summarise a document, Agent A's context — including any sensitive data it has processed — may leak into the request. Agent B, which may have a lower data classification, now has access to data it shouldn't see. The developer didn't intend a data transfer; the delegation created one.

RAG poisoning scales across agents. A poisoned document in a shared vector database doesn't just affect one model — it affects every agent that queries that database. In a multi-agent system, the poisoned data can be retrieved by one agent and passed to others through the message bus, amplifying the poisoning across the entire orchestration.

Memory becomes a persistent attack surface. If agents have persistent memory across sessions, poisoned data injected in one session persists into future sessions. In single-model systems, this is a context window risk. In multi-agent systems, a poisoned memory in one agent can contaminate others through shared interactions.

Cross-classification data mixing. Different agents may legitimately operate at different data classification levels — one processes public data, another processes confidential customer records. Without explicit fencing, the message bus becomes a channel for data to flow from high-classification agents to low-classification agents.

Derived data elevation (DR-03). An agent combines two individually non-sensitive data fields — customer ID and purchase history — and the result is PII. An analyst agent aggregates anonymised records and the output is re-identifiable. In multi-agent systems, data passes through multiple processing stages, and each stage can increase the effective classification of the output without any agent recognising the transition. The output is treated at the classification of the inputs, when it should be treated higher. Classification must be reassessed after processing, not just inherited from source data.

Controls by Tier¶

Tier 1 — Supervised¶

Control	Requirement	Implementation Notes
DP-1.1 Data classification	Classification applied to all agent data flows (input, output, inter-agent)	At minimum: public, internal, confidential, restricted.
DP-1.2 Logical separation	Agents handling different classification levels do not share context or memory	Enforced by policy at Tier 1; infrastructure at Tier 2.
DP-1.3 Output logging	All agent outputs captured and available for review	Enables post-hoc detection of sensitive data leakage.
DP-1.4 RAG inventory	RAG data sources inventoried per agent	Organisation knows which knowledge bases each agent accesses.
DP-1.5 Data flow diagram	Documented diagram showing what data moves between which agents	Must be maintained when agents or data sources change.
DP-1.6 Classification metadata propagation	Data classification tags travel with data through inter-agent messages	Messages without classification metadata are rejected by the bus. Prevents classification from being lost between agents.

Tier 2 — Managed¶

All Tier 1 controls remain active, plus:

Control	Requirement	Implementation Notes
DP-2.1 DLP on message bus	Inter-agent messages scanned for sensitive data patterns before delivery	PII, credentials, financial data, health data. Messages above recipient's classification are blocked.
DP-2.2 RAG integrity and freshness validation	Knowledge base content checksummed at ingestion; periodic verification including content currency	Changes trigger automated review. Recommended: daily integrity checks. Freshness metadata tracks whether content has been superseded; documents past defined freshness window flagged for review (Amendment: DR-02).
DP-2.3 Infrastructure data fencing	Cross-agent data isolation enforced at platform level	Agent A at "confidential" cannot access Agent B's "restricted" data store, even with application-layer compromise.
DP-2.4 Memory isolation	Per-agent persistent memory isolated; agents cannot read/write other agents' memory	Shared state mediated exclusively through the message bus with DLP scanning.
DP-2.5 Derived data reclassification	Agent outputs are assessed for classification elevation when combining or enriching data from multiple sources	DLP evaluates whether the output classification should be higher than the input classification. Elevation rules defined per data type (e.g., combining identifiers with behavioural data = PII). Elevated data is tagged at the new classification before it enters the message bus (DR-03).

Tier 3 — Autonomous¶

All Tier 2 controls remain active, plus:

Control	Requirement	Implementation Notes
DP-3.1 Real-time RAG integrity	Checksums verified at query time, not just periodically	Retrieval blocked if checksum mismatch at the moment of access.
DP-3.2 Memory decay	Agent persistent memory has a maximum retention window; auto-purged beyond it	Limits the lifespan of poisoned data. Retention windows documented per agent.
DP-3.3 Cross-session memory analysis	Independent analysis agent reviews stored memory entries for poisoning indicators	Operates on separate infrastructure; cannot be influenced by task agents.
DP-3.4 Data provenance chain	Every data item carries provenance metadata: origin, timestamp, agent, transformations	Enables root cause tracing when data integrity issues are detected.

Testing Criteria¶

Tier 1 Tests¶

Test ID	Test	Pass Criteria
DP-T1.1	Classification audit	Every agent's data flows (input, output, inter-agent) have a documented classification.
DP-T1.2	Cross-classification review	Verify no agent processing confidential data shares context with agents at lower classification without explicit approval.
DP-T1.3	Output log completeness	Perform 20 agent actions. Verify all 20 outputs appear in the audit log.
DP-T1.4	RAG inventory accuracy	Compare documented RAG inventory against actual agent configurations. No undocumented sources.
DP-T1.5	Classification metadata	Send 10 inter-agent messages. Verify each carries classification metadata. Send a message without classification metadata and verify it is rejected by the bus.

Tier 2 Tests¶

Test ID	Test	Pass Criteria
DP-T2.1	DLP detection rate	Send 50 test messages containing known PII patterns through the bus. DLP detection rate ≥ 95%.
DP-T2.2	DLP evasion	Attempt common evasion techniques (Base64 encoding, character substitution, fragmentation). Measure bypass rate. Target: < 10% bypass.
DP-T2.3	RAG integrity and freshness	Modify a document in the RAG store. Verify the integrity check detects the modification within the defined schedule. Also: mark a document as superseded. Verify the freshness check flags it for review.
DP-T2.4	Cross-agent data fencing	From within an agent's execution environment, attempt to access another agent's data store. Access is blocked at the infrastructure level.
DP-T2.5	Memory isolation	From within an agent, attempt to read another agent's persistent memory. Read is blocked.
DP-T2.6	Derived data elevation	Agent combines two non-sensitive fields (e.g., anonymised ID + location history) that together constitute PII. Verify the output is reclassified at the higher level before entering the message bus.

Tier 3 Tests¶

Test ID	Test	Pass Criteria
DP-T3.1	Real-time RAG integrity	Modify a RAG document. Immediately attempt retrieval. Retrieval is blocked due to checksum mismatch.
DP-T3.2	Memory decay enforcement	Write a memory entry. Wait for the retention window to expire. Verify the entry is purged.
DP-T3.3	Memory poisoning detection	Inject a deliberately inconsistent memory entry. Verify the cross-session analysis agent flags it within the configured analysis interval.
DP-T3.4	Provenance chain verification	Trace a data item end-to-end from ingestion through agent processing to final output. All provenance metadata is present and consistent.

Maturity Indicators¶

Level	Indicator
Initial	No data classification on agent data flows. Agents share RAG sources without access controls. No DLP on inter-agent communication.
Managed	Data classification applied. RAG sources inventoried. Agent outputs logged. Data flow diagram exists.
Defined	DLP active on message bus. RAG integrity validated on schedule. Cross-agent data fencing enforced at infrastructure level. Per-agent memory isolation.
Quantitatively Managed	DLP detection rate measured and reported. RAG integrity check frequency and results tracked. Memory isolation tested regularly with documented results.
Optimising	Real-time RAG integrity at query time. Memory decay policies with documented rationale. Independent memory analysis agent. Full data provenance chain.

Common Pitfalls¶

Classifying agents instead of data flows. An agent is not "confidential" — it processes data that is confidential. The same agent might process both internal and confidential data depending on the task. Classification must be applied to the data flows, not the agent itself.

Trusting RAG content because it's internal. RAG databases are a persistent injection point. An attacker who can modify a document in the knowledge base has a standing injection into every agent that queries it. Integrity validation is not optional.

Assuming memory isolation from model provider guarantees. Model providers may offer session isolation, but if your orchestration framework maintains its own context store (which most do), that store is the actual memory surface. The provider's isolation guarantees don't cover your framework's state management.

Scanning outputs but not inter-agent messages. DLP on final outputs catches data leakage to end users. But in a multi-agent system, the more dangerous leak path is agent-to-agent — where sensitive data crosses trust boundaries invisibly within the orchestration.

Inheriting input classification without reassessment. An agent that combines public customer IDs with internal behavioural data produces output that is neither public nor internal — it's PII. If the output inherits the classification of the higher input ("internal"), it's still under-classified. Classification must be reassessed after processing, particularly when agents combine, enrich, or aggregate data from multiple sources. The output classification may be higher than any individual input.

Dropping classification metadata between agents. If classification tags don't travel with data through the message bus, downstream agents and DLP controls have no basis for enforcement. A message arriving without a classification tag should be treated as an error, not as unclassified.¶

AI Runtime Behaviour Security, 2026 (Jonathan Gill).