# PIF v0.1 — Preflight Interchange Format

**Status:** Draft. Initial public release.
**Authors:** RegCheck (initial author of the spec; open to additional contributors)
**License:** Apache 2.0 (spec and reference implementation)

## What PIF is

PIF (Preflight Interchange Format) is an open JSON specification for representing the inputs and outputs of an AI workflow preflight assessment in regulated industries — pharma being the initial target.

PIF defines the shape of three artifacts:

- `WorkflowDescription` — what an AI workflow is intended to do
- `PreflightAssertion` — the assessment of that workflow against applicable regulatory and operational considerations
- `PreflightSession` — the durable record wrapping description, assertion, tool invocations, and reviewer actions

PIF does NOT define:
- The content of any specific regulatory requirement
- How a preflight should reason about applicability
- Confidence calibration methodology
- Acceptable thresholds for any field

These are deliberately left to implementations. PIF describes the **shape** of preflight artifacts so they can flow between tools, audit systems, and reviewers — not the substance of regulatory interpretation.

PIF v0.1 is tuned for pharma and GxP contexts — that is the initial author's working domain. The artifact shape is general enough to apply to medtech, diagnostics, banking, and critical-infrastructure preflight scenarios; future minor versions may add domain-specific vocabularies. Experiments in adjacent regulated sectors are encouraged.

## Why a shared format

Pharma teams deploying AI workflows currently document compliance considerations in inconsistent ways: Confluence pages, Word documents in change controls, fields in RIM systems, sometimes nowhere. When an auditor or inspector asks "how did you assess whether this AI workflow was appropriate?" — there's no shared shape to the answer.

PIF aims to be to AI workflow preflight what SBOM is to software supply chains, OpenAPI is to APIs, or SPDX is to licensing metadata: a portable, validatable, machine-readable artifact format that organizations and tools can produce and consume consistently.

## Conceptual model

PIF is **workflow-first**. The primary object is the workflow being assessed, not the regulatory document being searched. Requirements are *projected onto* workflows; they are not the starting point.

This matters because it forces preflight outputs to answer the question the workflow owner actually has — "is this workflow appropriate given what it's trying to do?" — rather than the question search engines answer — "what regulations exist about topic X?"

Three principles run through the design:

- **Honesty by architecture.** Required fields like `assumptions_made`, `clarifying_questions`, `out_of_scope`, and `applicability_basis` make uncertainty and limits structural, not optional disclaimer text.
- **Composable primitives.** A `PreflightAssertion` is built from multiple tool invocations recorded in the `PreflightSession`. Producers may decompose the work however they choose, but the artifact shape is consistent.
- **Replayability.** Combined with `corpus_snapshot` and `produced_by.methodology`, a PIF document carries enough metadata that the assessment can be reasoned about — and ideally reproduced — months or years later.

## Lifecycle

The three artifact types form a forward flow. A `WorkflowDescription` is authored by the workflow owner (or generated from intake) and describes intent. A `PreflightAssertion` is produced by a preflight tool against that description and records the assessment. A `PreflightSession` wraps both, plus the tool invocations and reviewer actions in between, into a single durable record that can be archived, audited, and replayed.

```
  WorkflowDescription          PreflightAssertion          PreflightSession
  ──────────────────           ──────────────────          ────────────────
  designer authors      ▸      tool produces        ▸      both wrapped +
  intent + scope               findings + basis            tool calls + reviews
                                                           stored durably
```

The boundary is deliberate: descriptions and assertions can flow independently (e.g., a description can be re-assessed by a different tool), while the session is the system-of-record envelope an audit function holds onto.

## Required fields

PIF v0.1 is strict on a minimal core, permissive on extensions.

### WorkflowDescription — required

- `pif_version`
- `workflow_id`
- `intent`
- `ai_role`
- `output_destination`
- `human_gate`

Everything else is optional but recommended. A WorkflowDescription with only required fields is valid PIF; it will produce a less precise PreflightAssertion than one with more fields populated.

### PreflightAssertion — required

- `pif_version`
- `assertion_id`
- `workflow_ref`
- `produced_by` (with at minimum `tool` and `version`)
- `produced_at`
- `risk_classification` (with `level` and `rationale`)
- `applicable_requirements` (with applicability basis and source per requirement)
- `missing_controls`
- `assumptions_made`
- `verification_steps`
- `status`

The audit-defensibility argument hinges on these last several being mandatory. A preflight artifact that doesn't tell you what assumptions it made, what controls it identified as missing, and how a human can verify it independently — is not an audit artifact, it's just an answer.

### PreflightSession — required

- `pif_version`
- `session_id`
- `workflow_description`
- `preflight_assertion`
- `created_at`

### Recommended shapes for `corpus_snapshot` and `produced_by.methodology`

`corpus_snapshot` and `produced_by.methodology` are optional in v0.1 but central to replayability. Implementations are encouraged to populate them with a stable shape so consumers can reason about provenance without per-tool special cases.

Recommended fields for `corpus_snapshot`:

- `name`: human-readable corpus identifier (e.g., `preclari-eu-gmp`)
- `version` — corpus version string, monotonic
- `snapshot_date` — ISO-8601 timestamp of the snapshot
- `source_count` — integer count of source documents in the snapshot
- `hash` — content hash of the snapshot (e.g., `sha256:...`) so consumers can detect drift

Recommended fields for `produced_by.methodology`:

- `name`: methodology identifier (e.g., `preclari-method`)
- `version` — methodology version (e.g., `1.0`)
- `reference_url` — URL to the published methodology document

These shapes are not enforced by the v0.1 JSON Schemas (the fields accept a free string today for backward compatibility). They are documented here to keep implementations from diverging on what "snapshot" and "methodology" mean in practice.

## Examples

### Minimal valid WorkflowDescription

The smallest PIF artifact: a `WorkflowDescription` populated with only the required fields. Valid against `workflow-description.schema.json` but will produce a thin assertion.

```json
{
  "pif_version": "0.1",
  "workflow_id": "wf_minimal_example_001",
  "intent": "Summarize incoming customer complaints into a structured report for the quality team to triage.",
  "ai_role": "summarization",
  "output_destination": "draft_for_review",
  "human_gate": "review"
}
```

### Rich PreflightAssertion (excerpted)

A `PreflightAssertion` with most optional fields populated. Excerpted for readability; the full document lives at [`spec/v0.1/examples/preflight-assertion.example.json`](https://github.com/cfpramod/preclari-mcp/blob/main/spec/v0.1/examples/preflight-assertion.example.json).

```json
{
  "pif_version": "0.1",
  "assertion_id": "pa_2026_001_a",
  "workflow_ref": "wf_qd_triage_2026_001",
  "produced_by": {
    "tool": "preclari",
    "version": "0.4.2",
    "methodology": "preclari-method-v1.0",
    "provider": "anthropic",
    "model_used": "claude-sonnet-4.6",
    "model_assignments": {
      "risk_classification": "claude-sonnet-4.6",
      "requirement_projection": "claude-sonnet-4.6"
      // + 3 more steps
    },
    "entitlements": {
      "jurisdictions": ["EU", "US", "CH", "UK"],
      "gxp_domains": ["GMP", "GDP", "GCP", "GLP", "GVP", "CSV", "data_integrity", "quality_systems"]
    }
  },
  "produced_at": "2026-05-17T14:30:00Z",
  "corpus_snapshot": {
    "snapshot_id": "corpus_2026_w19",
    "snapshot_hash": "sha256:7f3e8a2b...4f",
    "snapshot_date": "2026-05-12T00:00:00Z",
    "source_count": 247
  },
  "risk_classification": {
    "level": "medium",
    "rationale": "Workflow influences GxP decisions but human approval gate is required before any action. Lifecycle stage is pilot, narrowing the operational scope.",
    "drivers": [
      "data_classes includes gxp_record",
      "human_gate=approve_each (every output reviewed)",
      "lifecycle_stage=pilot (limited blast radius)"
    ]
  },
  "applicable_requirements": [
    {
      "requirement_id": "req_001",
      "requirement_text": "Computerized systems that influence GxP decisions are subject to validation expectations proportionate to their risk and intended use.",
      "source": {
        "url": "https://example-regulator.eu/annex-11",
        "canonical_document_id": "EU-GMP-Annex-11",
        "issuing_authority": "European Commission",
        "jurisdiction": "EU",
        "document_type": "annex",
        "effective_date": "2011-06-30",
        "content_hash": "sha256:a1b2c3d4...b2"
      },
      "applicability_basis": "The workflow uses an AI system to produce structured recommendations that feed into GxP decisions. Annex 11 applies because the system is computerised, used in GxP context, and influences regulated decisions.",
      "confidence": "high",
      "jurisdictional_scope": ["EU"]
    }
    // + 2 more requirements
  ],
  "missing_controls": [
    {
      "control": "documented_user_requirements_specification",
      "rationale": "Annex 11 expects URS for computerised systems influencing GxP decisions. WorkflowDescription does not indicate whether a URS exists.",
      "criticality": "required",
      "related_requirements": ["req_001"]
    }
    // + 5 more controls
  ],
  "assumptions_made": [
    {
      "assumption": "The Basel facility holds a Swiss GMP manufacturing authorization from Swissmedic.",
      "impact_if_wrong": "Swiss-specific requirements would not apply; assessment narrows to EU-only.",
      "basis": "Inferred from context_notes mention of Basel manufacturing without explicit confirmation."
    }
    // + 2 more assumptions
  ],
  "verification_steps": [
    {
      "step": "Confirm the EU GMP Annex 11 citation matches the requirement text in req_001 and that the retrieved document version remains current.",
      "type": "source_check",
      "applies_to": "applicable_requirements[0]"
    }
    // + 2 more steps
  ],
  "recommendation": "Treat as a controlled pilot with missing_controls addressed before first live use. Document the qualification approach proportionate to medium risk classification.",
  "status": "draft",
  "notice": "This assertion is informational and does not constitute regulatory advice."
}
```

The `// + N more` markers indicate elision for readability. The on-disk fixture is pure JSON — comments are not part of the format.

## Controlled vocabularies

PIF v0.1 defines closed enums for the following fields. Extension via the `extensions` object is permitted but the core vocabularies are normative. Several enums encode distinctions that are non-obvious to non-pharma readers; the values below carry short glosses so producers and consumers agree on meaning.

### `ai_role` — what the AI is asked to do

- `decision` — AI makes the final call without human approval in the workflow.
- `recommendation` — AI proposes; a human approves or rejects before action.
- `draft` — AI produces content for a human to edit and own.
- `classification` — AI assigns categories or labels to inputs.
- `extraction` — AI pulls structured data from unstructured sources.
- `copilot` — AI suggests within a human-driven workflow, no separate approval step.
- `summarization` — AI condenses content for human consumption.

### `output_destination` — where the AI's output flows

- `advisory` — informs a human; not stored as a record.
- `regulated_decision` — becomes part of a GxP decision or regulatory submission.
- `system_of_record` — written to a controlled system (QMS, RIM, LIMS, etc.).
- `draft_for_review` — staged for human edit before any downstream use.
- `automated_action` — triggers downstream action without human review.
- `archive` — stored but not actioned.

### `human_gate` — how a human stays in the loop

- `none` — no human in the loop; the AI's output flows directly downstream.
- `review` — optional human review; the human may or may not look at any given output.
- `approve_each` — reviewer approves every model output individually before downstream use.
- `approve_batch` — reviewer approves outputs in batches (e.g., a daily review of the day's outputs).
- `post_hoc_audit` — outputs flow without prior review; a sample is audited after the fact.

### `reversibility` — can the workflow's action be undone

- `reversible` — the downstream action can be reversed without material consequence.
- `partial` — some effects can be reversed; some cannot (e.g., a draft is reversible but downstream perception is not).
- `irreversible` — the action cannot be undone once taken (e.g., batch release, regulatory submission).

### `lifecycle_stage` — where the workflow sits

- `design` — pre-implementation; design and feasibility work.
- `pilot` — limited deployment with controlled scope and elevated monitoring.
- `production` — full deployment under steady-state controls.
- `retirement` — being decommissioned; included so PIF can describe sunsetted workflows.

### `risk_tolerance` — the workflow owner's stated tolerance

- `very_low` — zero-defect target; controls scoped for worst-case scenarios.
- `low` — conservative; controls scoped for likely failure modes.
- `medium` — balanced; standard industry practice.
- `high` — exploratory or low-stakes; lighter controls accepted.

### Other closed enums

- `data_classes`: `gxp_record`, `pii`, `phi`, `manufacturing_data`, `clinical_data`, `regulatory_submission`, `quality_data`, `safety_data`, `supply_chain_data`, `commercial_data`, `other`
- `gxp_domains`: `GMP`, `GDP`, `GCP`, `GLP`, `GVP`, `CSV`, `data_integrity`, `quality_systems`, `regulatory_affairs`, `pharmacovigilance`, `labeling`
- `risk_level` (assertion output): `low`, `medium`, `high`, `critical`
- `confidence`: `high`, `medium`, `low`, `contested`
- `status`: `draft`, `in_review`, `approved`, `contested`, `superseded`

Jurisdictions use ISO 3166 alpha-2 codes plus recognized supranational codes (`EU`, `ICH`, `WHO`, `PICS`).

## Extensions

Implementations may add fields via the `extensions` object on any top-level type. Field names within extensions should be prefixed with the implementing tool's identifier (e.g., `preclari:custom_field`).

Implementations MUST NOT reject documents containing unknown extensions. Forward compatibility is a hard requirement.

Example: a vendor adding internal scoring fields under its own namespace.

```json
"extensions": {
  "preclari:internal_risk_score": 0.74,
  "preclari:calibration_method": "isotonic_v2"
}
```

A consumer that does not recognise the `preclari:` namespace MUST still accept and pass through the document; the values can be ignored, surfaced as unknown, or carried forward, but the document itself remains valid PIF.

### Extending closed enums

Closed enums (`ai_role`, `output_destination`, `human_gate`, `reversibility`, `lifecycle_stage`, `risk_tolerance`, `risk_level`, `confidence`, `status`, `data_classes`, `gxp_domains`, `document_type`) MUST NOT be extended ad-hoc by implementations. A producer that emits an enum value not defined in this version of the spec is producing a non-conformant document.

To propose a new enum value:

1. **Preferred** — open a pull request against the spec repository describing the use case, why no existing value fits, and the backward-compatibility implication. Accepted proposals ship in the next minor version.
2. **Otherwise**, carry the implementation-specific value under the `extensions` object with a vendor namespace (e.g., `"extensions": { "preclari:custom_risk_level": "elevated" }`) rather than reusing the core enum field.

This rule exists so that a tool consuming PIF can rely on the core enums having a fixed shape. Forward compatibility lives in `extensions`, not in the closed vocabularies.

## Signatures

PIF v0.1 defines a `signature` field on `PreflightAssertion`. The field is optional in the spec; implementations may require it for specific use cases (e.g., paid tiers, audit submissions).

Recommended:
- Sign over the canonicalized form of the document with the `signature` field removed
- Canonicalization: JCS (JSON Canonicalization Scheme, RFC 8785)
- Algorithms: Ed25519 (recommended), RSA-PSS-SHA256, ECDSA-P256-SHA256

The verification model is intentionally simple. There is no PIF-level PKI. Implementations publish their own public keys and own the trust relationship with their users.

## JSON Schema URLs

The JSON Schemas for the three PIF artifact types are published at stable versioned URLs. Each schema's `$id` is its public URL, and `$schema` is `https://json-schema.org/draft/2020-12/schema`.

- [`https://preclari.com/pif/v0.1/workflow-description.schema.json`](https://preclari.com/pif/v0.1/workflow-description.schema.json)
- [`https://preclari.com/pif/v0.1/preflight-assertion.schema.json`](https://preclari.com/pif/v0.1/preflight-assertion.schema.json)
- [`https://preclari.com/pif/v0.1/preflight-session.schema.json`](https://preclari.com/pif/v0.1/preflight-session.schema.json)

The JSON-LD context for the same vocabulary is published at:

- [`https://preclari.com/pif/v0.1/context.jsonld`](https://preclari.com/pif/v0.1/context.jsonld)
- Also mirrored at the `.well-known` path: [`https://preclari.com/.well-known/pif/v0.1/context.jsonld`](https://preclari.com/.well-known/pif/v0.1/context.jsonld)

These URLs are stable for the lifetime of v0.1. Future minor versions (v0.2, etc.) publish at their own paths; v0.1 URLs do not move.

## Validation

The repository includes JSON Schema files in `spec/v0.1/`. Any standards-compliant JSON Schema validator (Draft 2020-12) can validate PIF documents.

A CLI validator is provided in `validator/`:

```
pif-validate path/to/document.json
```

The validator returns 0 on conformance, non-zero with line-level errors otherwise.

### Conformance test suite

A conformance test bundle for PIF v0.1 lives at [`spec/v0.1/conformance/`](https://github.com/cfpramod/preclari-mcp/tree/main/spec/v0.1/conformance). It carries a manifest of 25 test cases — valid examples across the risk tiers, plus 19 single-violation invalid fixtures each testing exactly one schema rule (missing required fields, closed-enum violations, pattern violations, length violations, type violations, `additionalProperties: false` violations). A reference TypeScript harness validates every case against the declared expectation and asserts the validator surfaced the expected violation; CI runs the suite on every PR. Implementers of PIF validators should run the suite against their tool — a validator that disagrees on any case is not v0.1-conformant.

## JSON and JSON-LD

PIF documents are valid JSON. The repository also includes a JSON-LD `@context` at `spec/v0.1/context.jsonld` that maps PIF fields to semantic IRIs.

The same field shapes serve both modes. A flat JSON PIF document can be made JSON-LD-compatible by adding `"@context": "https://preclari.com/pif/v0.1/context.jsonld"` at the top level. A JSON-LD PIF document with the `@context` removed is valid flat JSON.

Implementations are encouraged to support both. Most consumers will use flat JSON. Knowledge graph, provenance, and semantic reasoning integrations benefit from JSON-LD.

### Example: PIF inside a provenance graph

A common JSON-LD use case is linking a `PreflightAssertion` to the workflow it assesses inside a broader provenance graph (e.g., a QMS or RIM system that already exposes its records as linked data):

```json
{
  "@context": "https://preclari.com/pif/v0.1/context.jsonld",
  "@type": "PreflightAssertion",
  "assertion_id": "pa_2026_001_a",
  "workflow_ref": "wf_qd_triage_2026_001",
  "describes": {
    "@id": "urn:qms:workflow:wf_qd_triage_2026_001"
  },
  "produced_by": {
    "tool": "preclari",
    "version": "0.4.2"
  },
  "produced_at": "2026-05-17T14:30:00Z",
  "status": "approved"
}
```

The `describes` link is an extension pattern using a `urn:` identifier so the assertion can resolve into a host system's namespace (`urn:qms:...`, `urn:rim:...`, `https://example.org/qms/...`) without PIF taking a position on which QMS or RIM the consumer happens to use.

## Versioning

PIF follows semantic versioning at the spec level:

- **Patch** (0.1.x): documentation clarifications, non-normative additions
- **Minor** (0.x): additive changes that preserve backward compatibility (new optional fields, expanded enum values that are gracefully ignored by older validators)
- **Major** (x.0): breaking changes

Older spec versions remain available at versioned URLs indefinitely.

### Cadence

v0.1 is intended to remain stable for at least 6 months from publication so implementers can ship against it without chasing the spec. v0.2 will focus on a formal vocabulary for control names (currently free-text in `missing_controls.control`) and on RIM integration shapes — both already listed under "What's not in v0.1". Breaking changes between minor versions are not permitted; anything breaking ships under a major version with a migration story.

## What's not in v0.1

Deliberately deferred to v0.2 or later:

- A formal vocabulary for control names (currently free-text in `missing_controls.control`)
- Schema for jurisdictional override rules
- Bidirectional links between WorkflowDescription and policy systems (e.g., RIM integration shapes)
- Multi-language support for `intent` and other free-text fields
- A digital ledger / transparency log for signed assertions

These are valuable but not necessary for v0.1 to be useful.

## Interoperability

PIF is designed to flow between systems already in place at organisations running regulated AI workflows. Implementers from any of these categories are welcome:

- **eQMS / QMS systems** — the natural system-of-record for `PreflightSession` documents and for downstream change control of `missing_controls`.
- **RIM (Regulatory Information Management) systems** — for linking `PreflightAssertion` documents to product, submission, and authorisation context.
- **CSV (Computerised Systems Validation) toolchains** — for capturing PIF artifacts as part of qualification evidence for AI-impacting computerised systems.
- **Agent frameworks** — MCP servers, LangChain, LlamaIndex, Bedrock Agents, and similar runtimes that produce `WorkflowDescription` documents from agent definitions and consume `PreflightAssertion` documents as preflight gating.
- **Audit / SIEM systems** — for ingesting signed `PreflightSession` documents into the audit trail for inspections and internal periodic review.

PIF takes no position on which of these systems an organisation runs. The format aims to be the portable artifact that flows between them.

## Governance

PIF is maintained by RegCheck as the initial author, with the intent to transition to an open working group as adoption broadens. Decisions are documented in [`GOVERNANCE.md`](https://github.com/cfpramod/preclari-mcp/blob/main/GOVERNANCE.md) at the repository root: change types, the review window for minor releases, the criteria for adding fields and enum values, and the deprecation policy. Breaking changes require a major version bump and a 30-day public RFC.

Once v0.1 sees adoption beyond RegCheck, the spec and reference implementation are intended to move to a neutral `pif-spec` GitHub organisation under continued Apache 2.0 licensing. This commitment is recorded here so future contributors can hold the maintainers to it.

## How to contribute

PIF is currently maintained by the RegCheck team but is intended to be a community spec. Contributions welcome via the repository's issues and pull requests. See `GOVERNANCE.md` for the decision process.
