Skip to content

Engineering Leads Track

By the end of this track you will be able to

  • Identify the production failure modes in multi-agent systems that conventional tooling misses
  • Evaluate your current observability stack against chain-level integrity requirements
  • Design verification receipt patterns and epistemic integrity interfaces for agent chains
  • Select and implement MASO controls with concrete integration patterns for common agent frameworks
  • Build instrumentation, dashboards, and testing pipelines that catch Phantom Compliance-style failures

Your thread

As an engineering lead, your job is to build systems that work reliably in production, and to know when they aren't working before your customers do. Multi-agent AI systems introduce failure modes that don't trigger alerts, don't produce errors, and don't show up in your existing dashboards. This track gives you the "what do I actually build" story.

The golden thread: threat modelwhat breaks in practicewhich controls are runtime vs design-timewhat instrumentation looks like.

Every module follows the same five-beat structure, moving from problem to solution:

  1. What goes wrong (scenario-driven)
  2. Why current controls don't catch it (the gap)
  3. What epistemic integrity means in this context (the concept)
  4. Which MASO controls address it (the framework)
  5. How to verify it's working (the evidence)

Modules

Module Focus Time
1. What Breaks in Practice Concrete production failure modes: the things that page you at 2am ~20 min
2. Why Current Tools Miss It Your observability stack, LangSmith, LangFuse: what they give you and what they don't ~20 min
3. Epistemic Integrity An engineering requirement, not a philosophy lecture: data structures and interfaces ~25 min
4. MASO Controls Execution Control, Observability, Data Protection: what you actually build ~30 min
5. Instrumentation & Evidence Metrics, dashboards, canary tests, chaos testing for agent chains ~25 min
Decision Exercise Ambiguous production signals: what do you do next? ~15 min

Total estimated time: ~2.5 hours


Prerequisites

Before starting this track, complete the core scenario, which introduces the Phantom Compliance failure that every module references. If you have already read it, start Module 1.


What this track is not

This track does not cover prompt engineering, model fine-tuning, or single-agent guardrail configuration. Those are important, but well-covered elsewhere. This track focuses on the multi-agent runtime problems that existing resources don't address: the failures that happen between agents, not within them.


Start Module 1 →