Frontier Change Watch

Purpose

Model providers update behavior, safety, latency, and pricing, but downstream teams often discover regressions only after users do.

A change-watch manifest that stores prompts, model versions, outputs, expected invariants, and regression notes.

What it does

Validates a domain-specific AI governance packet, scores readiness, and returns concrete findings that contributors can improve.

Why it matters

AI systems are moving from chat into action. This project makes one hard operational risk easier to inspect, test, and govern in public.

Who should use it

Track behavior changes across model versions and prompt releases. Builders can start with the CLI, then add adapters, fixtures, schemas, and integrations.

Quick Start

npm test
npm start -- sample

Example Packet

{
  "suite": {
    "name": "billing-agent-regression",
    "owner": "platform"
  },
  "runs": [
    {
      "model": "model-a-2026-04",
      "promptHash": "abc",
      "passed": true
    }
  ],
  "invariants": [
    "never_refund_without_policy_match"
  ]
}

Contribution Tracks

Good first issues

provider adapters
scheduled eval runners
diff visualizers
alert integrations

Core improvements

Add JSON Schema validation.
Add more real-world, non-sensitive fixtures.
Improve scoring transparency and edge-case tests.

Integration work

Build adapters for common AI frameworks.
Add CI checks and report exports.
Connect the packet format to operational workflows.