Purpose
Model providers update behavior, safety, latency, and pricing, but downstream teams often discover regressions only after users do.
A change-watch manifest that stores prompts, model versions, outputs, expected invariants, and regression notes.
What it does
Validates a domain-specific AI governance packet, scores readiness, and returns concrete findings that contributors can improve.
Why it matters
AI systems are moving from chat into action. This project makes one hard operational risk easier to inspect, test, and govern in public.
Who should use it
Track behavior changes across model versions and prompt releases. Builders can start with the CLI, then add adapters, fixtures, schemas, and integrations.
Quick Start
npm test
npm start -- sample
Example Packet
{
"suite": {
"name": "billing-agent-regression",
"owner": "platform"
},
"runs": [
{
"model": "model-a-2026-04",
"promptHash": "abc",
"passed": true
}
],
"invariants": [
"never_refund_without_policy_match"
]
}
Contribution Tracks
Good first issues
- provider adapters
- scheduled eval runners
- diff visualizers
- alert integrations
Core improvements
- Add JSON Schema validation.
- Add more real-world, non-sensitive fixtures.
- Improve scoring transparency and edge-case tests.
Integration work
- Build adapters for common AI frameworks.
- Add CI checks and report exports.
- Connect the packet format to operational workflows.