Open-source AI infrastructure

Clinical Note De-ID Open

Open PHI risk checks for AI-ready clinical text pipelines.

Python MIT licensed Offline by default Community extensible

Purpose

Clinical AI needs data, but unsafe de-identification can expose patients and block responsible research.

A de-identification readiness checker that records PHI classes, replacement strategy, reviewer status, and residual risk before release.

What it does

Validates a domain-specific AI governance packet, scores readiness, and returns concrete findings that contributors can improve.

Why it matters

AI systems are moving from chat into action. This project makes one hard operational risk easier to inspect, test, and govern in public.

Who should use it

Open PHI risk checks for AI-ready clinical text pipelines. Builders can start with the CLI, then add adapters, fixtures, schemas, and integrations.

Quick Start

PYTHONPATH=src python3 -m unittest discover -s tests
python3 -m clinical_note_deid_open.cli sample

Example Packet

{
  "corpus": {
    "name": "cardiology-notes",
    "count": 2500
  },
  "deid": {
    "names": "surrogate",
    "dates": "shifted",
    "locations": "generalized"
  },
  "review": {
    "privacyOfficer": "",
    "residualRisk": "medium"
  }
}

Contribution Tracks

Good first issues

  • FHIR export
  • PHI pattern packs
  • review dashboards
  • risk scoring research

Core improvements

  • Add JSON Schema validation.
  • Add more real-world, non-sensitive fixtures.
  • Improve scoring transparency and edge-case tests.

Integration work

  • Build adapters for common AI frameworks.
  • Add CI checks and report exports.
  • Connect the packet format to operational workflows.