AI Data Retention Guidance¶

Data retention requirements for AI systems, covering the 10 data states specific to AI and jurisdictional requirements.

The 10 Data States in AI Systems¶

AI systems create data in states that traditional retention policies may not address:

State	Description	Retention Consideration
1. At rest in feeder systems	Source data before AI processing	Existing policies apply
2. In transit to AI	Data moving to AI system	Transient, no retention
3. In vector store	Embeddings of source documents	May be invertible; retain as source
4. In model context	Prompt + retrieved context	Session-scoped
5. In model memory	Within-session state	Session-scoped
6. In model response	Generated output	Retain per policy
7. In interaction logs	Full interaction records	Key retention decision
8. In Judge evaluation	Judge inputs and outputs	Retain with interaction
9. In HITL queue	Pending human review	Retain with interaction
10. In backups	Copies of above	Mirror source retention

Retention by Risk Tier¶

CRITICAL Systems¶

Data Type	Minimum Retention	Maximum Retention	Rationale
Full interaction logs	7 years	10 years	Regulatory, audit, litigation
System prompts (versioned)	7 years	Indefinite	Audit trail
Guardrail configuration	7 years	Indefinite	Audit trail
Judge evaluations	7 years	10 years	Assurance evidence
HITL decisions	7 years	10 years	Accountability
Model versions used	7 years	Indefinite	Reproducibility
Incidents	7 years	Indefinite	Lessons learned

HIGH Systems¶

Data Type	Minimum Retention	Maximum Retention	Rationale
Full interaction logs	3 years	7 years	Regulatory, investigation
System prompts (versioned)	3 years	Indefinite	Audit trail
Guardrail configuration	3 years	Indefinite	Audit trail
Judge evaluations	3 years	5 years	Assurance evidence
HITL decisions	3 years	5 years	Accountability
Model versions used	3 years	Indefinite	Reproducibility
Incidents	5 years	Indefinite	Lessons learned

MEDIUM Systems¶

Data Type	Minimum Retention	Maximum Retention	Rationale
Metadata + sampled content	1 year	3 years	Trend analysis
System prompts (versioned)	1 year	3 years	Audit trail
Guardrail configuration	1 year	3 years	Audit trail
Judge evaluations (sampled)	1 year	3 years	Assurance evidence
Model versions used	1 year	3 years	Reproducibility
Incidents	3 years	5 years	Lessons learned

LOW Systems¶

Data Type	Minimum Retention	Maximum Retention	Rationale
Basic metadata	90 days	1 year	Troubleshooting
System prompts (current)	90 days	1 year	Reference
Incidents	1 year	3 years	Lessons learned

Jurisdictional Requirements¶

United Kingdom¶

Regulation	Data Type	Requirement
UK GDPR	Personal data	Delete when no longer necessary; document lawful basis
FCA SYSC 9	Records of services and transactions	5 years minimum
FCA COBS 11	Order records	5 years
PRA SS1/23	Model documentation	Duration of model use + 5 years
Consumer Duty	Evidence of fair outcomes	5 years

European Union¶

Regulation	Data Type	Requirement
GDPR	Personal data	Delete when no longer necessary; document lawful basis
EU AI Act	High-risk AI logs	6 months minimum, longer if needed for obligations
EU AI Act	Documentation	Duration of AI system lifecycle
MiFID II	Transaction records	5 years
PSD2	Payment records	5 years

United States¶

Regulation	Data Type	Requirement
SOX	Financial records	7 years
HIPAA	Health information	6 years
GLBA	Financial customer information	5 years
CCPA/CPRA	Consumer data	Varies; disclose retention periods
SEC Rule 17a-4	Broker-dealer records	3-6 years depending on type
State laws	Varies	Check applicable states

Banking-Specific (Global)¶

Standard	Data Type	Requirement
Basel III	Risk model documentation	Duration of use + review cycle
SR 11-7	Model documentation, validation	Duration of use + examination cycle
BCBS 239	Risk data	Sufficient for risk reporting

Interaction Log Content¶

What to Log (by Tier)¶

Field	CRITICAL	HIGH	MEDIUM	LOW
Timestamp	✓	✓	✓	✓
User identity	✓	✓	✓	Optional
Session ID	✓	✓	✓	✓
Model version	✓	✓	✓	Optional
Model parameters	✓	✓	Optional	Optional
System prompt version	✓	✓	✓	Optional
Full user input	✓	✓	Sampled	Optional
Retrieved context (RAG)	✓	✓	Reference only	No
Full model output	✓	✓	Sampled	Optional
Guardrail results	✓	✓	✓	✓
Latency metrics	✓	✓	✓	✓
Cost	✓	✓	✓	Optional
Judge evaluation	✓	✓	Sampled	No

What NOT to Log¶

Data Type	Reason	Alternative
Full credit card numbers	PCI-DSS	Mask (last 4 digits)
Full SSN/national ID	Regulatory	Mask or tokenise
Passwords/credentials	Security	Never log
Raw biometric data	Privacy	Hash or don't log
Health data (unless required)	HIPAA/GDPR	Minimise or mask

PII in Logs¶

Detection and Handling¶

Stage	Action
At logging time	Detect PII using guardrails; flag or redact
In storage	Encrypt at rest; access controls
At retrieval	Verify authorisation; mask if displaying
At deletion	Ensure complete removal including backups

Redaction vs. Tokenisation¶

Approach	Use When	Tradeoff
Redaction	PII not needed for any purpose	Data lost permanently
Tokenisation	Need to re-identify for investigation	Token mapping must be secured
Masking	Partial visibility sufficient	Some data visible
Encryption	Full data needed, access controlled	Key management overhead

Vector Store Retention¶

Vector embeddings require special consideration:

Concern	Guidance
Embeddings can be inverted	Treat embeddings with same classification as source
Deletion complexity	Deleting from vector store may require rebuild
Versioning	Track which documents are in which version of store
Staleness	Set refresh/review cycles (see AI.5.4)

Recommended Approach¶

Classify vector store content at source data level
Track lineage from source documents to embeddings
Implement deletion procedures that work with your vector DB
Verify deletions are complete (not just soft-deleted)

Judge and HITL Data¶

Judge Evaluation Retention¶

Judge evaluations contain: - Copy of interaction being evaluated - Judge's analysis and findings - Metadata (Judge model version, evaluation time)

Retain Judge evaluations for the same period as the underlying interaction — they're part of the audit trail.

HITL Decision Retention¶

HITL decisions must capture: - What the human reviewed - What decision they made - Why (if documented) - Who made the decision - When

Retain HITL decisions for accountability — typically same as interaction retention or longer.

Deletion Procedures¶

Standard Deletion¶

Step	Action	Verification
1	Identify data eligible for deletion	Query by retention date
2	Verify no legal hold	Check with legal
3	Delete from primary storage	Confirm deletion
4	Delete from backups (per backup policy)	Confirm in next backup cycle
5	Delete from vector stores if applicable	Verify removal
6	Log deletion	Maintain deletion record

Legal Hold¶

When litigation or regulatory investigation is anticipated: 1. Identify potentially relevant data 2. Suspend deletion for that data 3. Document the hold scope and duration 4. Notify relevant personnel 5. Release hold only when legal confirms

Backup Considerations¶

Backup Type	Retention Approach
Daily incremental	30-90 days
Weekly full	90 days - 1 year
Monthly archive	Per data classification
Disaster recovery	Mirror primary retention

Key principle: Backup retention should not exceed primary retention without explicit justification. Otherwise you have data you should have deleted.

Audit and Compliance¶

Documentation Requirements¶

Maintain documentation of: - Retention policy (this document) - Data inventory (what AI data exists where) - Deletion logs (what was deleted when) - Legal holds (active and historical) - Exceptions (with justification and approval)

Periodic Review¶

Review Type	Frequency	Scope
Policy review	Annual	Update for regulatory changes
Implementation audit	Annual	Verify policy is followed
Deletion verification	Quarterly	Sample check that deletion occurred
Legal hold review	Quarterly	Confirm holds still needed

Implementation Checklist¶

Initial Setup¶

Classify all AI data by tier and type
Configure log retention periods
Implement automated deletion
Set up deletion verification
Document exceptions process
Train operations team

Ongoing¶

Monitor deletion job success
Review and respond to legal holds
Update policy for regulatory changes
Audit compliance quarterly
Report retention metrics

AI Runtime Behaviour Security, 2026 (Jonathan Gill).