Purpose
AI incidents are hard to reproduce because prompts, tool calls, hidden context, retrieval, and approvals are scattered across logs.
A normalized replay bundle validator for prompts, context, tool calls, decisions, outcomes, and mitigation notes.
What it does
Validates a domain-specific AI governance packet, scores readiness, and returns concrete findings that contributors can improve.
Why it matters
AI systems are moving from chat into action. This project makes one hard operational risk easier to inspect, test, and govern in public.
Who should use it
Replayable incident traces for AI agent failures. Builders can start with the CLI, then add adapters, fixtures, schemas, and integrations.
Quick Start
PYTHONPATH=src python3 -m unittest discover -s tests
python3 -m red_team_replay.cli sample
Example Packet
{
"incident": {
"id": "agi-2026-001",
"severity": "high"
},
"trace": [
{
"step": 1,
"actor": "agent",
"action": "read_email"
},
{
"step": 2,
"actor": "tool",
"action": "send_email"
}
],
"mitigation": {
"addedGate": "human_approval"
}
}
Contribution Tracks
Good first issues
- OpenTelemetry mapping
- LangSmith imports
- privacy scrubbing
- replay runners
Core improvements
- Add JSON Schema validation.
- Add more real-world, non-sensitive fixtures.
- Improve scoring transparency and edge-case tests.
Integration work
- Build adapters for common AI frameworks.
- Add CI checks and report exports.
- Connect the packet format to operational workflows.