Open-source AI infrastructure

Red Team Replay

Replayable incident traces for AI agent failures.

Python MIT licensed Offline by default Community extensible

Purpose

AI incidents are hard to reproduce because prompts, tool calls, hidden context, retrieval, and approvals are scattered across logs.

A normalized replay bundle validator for prompts, context, tool calls, decisions, outcomes, and mitigation notes.

What it does

Validates a domain-specific AI governance packet, scores readiness, and returns concrete findings that contributors can improve.

Why it matters

AI systems are moving from chat into action. This project makes one hard operational risk easier to inspect, test, and govern in public.

Who should use it

Replayable incident traces for AI agent failures. Builders can start with the CLI, then add adapters, fixtures, schemas, and integrations.

Quick Start

PYTHONPATH=src python3 -m unittest discover -s tests
python3 -m red_team_replay.cli sample

Example Packet

{
  "incident": {
    "id": "agi-2026-001",
    "severity": "high"
  },
  "trace": [
    {
      "step": 1,
      "actor": "agent",
      "action": "read_email"
    },
    {
      "step": 2,
      "actor": "tool",
      "action": "send_email"
    }
  ],
  "mitigation": {
    "addedGate": "human_approval"
  }
}

Contribution Tracks

Good first issues

  • OpenTelemetry mapping
  • LangSmith imports
  • privacy scrubbing
  • replay runners

Core improvements

  • Add JSON Schema validation.
  • Add more real-world, non-sensitive fixtures.
  • Improve scoring transparency and edge-case tests.

Integration work

  • Build adapters for common AI frameworks.
  • Add CI checks and report exports.
  • Connect the packet format to operational workflows.