PIF v0.2 — Preflight Interchange Format
Status: Draft. Minor release over v0.1. Cumulative validation: v0.2 schemas accept both v0.1 documents (pif_version: "0.1") and v0.2 documents (pif_version: "0.2"). Authors: Preclari (initial author of the spec; open to additional contributors). License: Apache 2.0.
This document describes what is new in PIF v0.2 relative to v0.1. The full v0.1 specification at ../v0.1/spec.md remains the canonical reference for concepts, artifact types, and design principles unchanged between versions. v0.2 readers should read v0.1 first, then this delta.
What changes in v0.2
| Area | v0.1 | v0.2 |
|---|---|---|
| Domain field name | gxp_domains_self_declared (WD), gxp_domains_identified (PA) |
regulatory_domains_self_declared, regulatory_domains_identified |
| Domain vocabulary shape | Closed enum of 11 GxP values + none_claimed |
Semi-closed CURIE pattern: closed namespace enum (~40 entries) + open token pattern |
| Vocabulary scope | Pharma / GxP only | Cross-industry (medtech, banking, ESG, cyber, etc.) |
missing_controls[].control |
Free-text | Cumulative CURIE pattern (accepts both v0.1 free-text and v0.2 CURIE; free-text removed in v0.3) |
| New envelope field | — | pif_corpus_version (optional, RECOMMENDED for audit trail) |
| New requirement field | — | applicable_requirements[].conditional_applicability (optional boolean) |
| Required array (PA) | 11 fields | 13 fields when pif_version: "0.2" — adds out_of_scope and clarifying_questions (closes a v0.1 schema-vs-prose drift) |
Schema $id path |
pif-spec.github.io/pif/v0.1/ |
pif-spec.github.io/pif/v0.2/ |
v0.1 documents continue to validate against v0.1 schemas indefinitely (v0.1 stability commitment is unchanged).
The vocabulary: semi-closed CURIE pattern
v0.2 replaces v0.1's closed GxP-only enum with a CURIE pattern that scales across the broader EU regulatory wave (and beyond) without requiring a spec PR for every new regulation token.
Format
Values in regulatory_domains_self_declared, regulatory_domains_identified, and missing_controls[].control follow:
<namespace>:<token>
OR the reserved literal none_claimed (for regulatory_domains_* only).
Schema validates structure, implementation validates semantics
The schema validates:
- Namespace is one of the closed enum (~40 entries).
- Token matches
[a-zA-Z0-9_.-]+. - Single colon separator (no multiple colons in the token).
- Length ≤ 128 characters total.
The implementation/corpus layer validates:
- Whether the token is recognised in the current corpus snapshot.
- Whether the semantic combination of namespace + token is meaningful.
A producer that emits iso:99999 (well-formed CURIE, valid namespace, unrecognised token) produces a structurally valid PIF document. The reference implementation flags it as valid PIF format, but unrecognised token in current corpus snapshot. This separates structural validity from factual recognition.
Normative rules (locked in Round 4 design review)
- Case sensitivity. Namespace MUST be lowercase. Tokens are case-preserving and case-sensitive (
gxp:GMPis valid and distinct fromgxp:gmp). - Whitespace. No leading, trailing, or internal whitespace anywhere. Producers MUST NOT trim; consumers MUST reject.
- Empty token. Rejected by the
+pattern quantifier. - Multiple colons. REJECTED in v0.2. The token character class excludes
:. Use.or_for hierarchical citation (iso27001:A.8.1.1,gamp5:appendix.m4). - Character encoding. US-ASCII only. Unicode tokens MUST be rejected. (Revisit in v0.3+ if international tokens become necessary.)
- Length.
maxLength: 128on the full CURIE string. - Normalisation. Tokens are opaque identifiers. No canonicalisation.
gxp:G.M.P≠gxp:GMP. Synonymy lives at the corpus layer, not the schema layer. - Array uniqueness.
uniqueItems: trueon bothregulatory_domains_self_declaredandregulatory_domains_identified. - Array canonicalisation for signing. Consumers preparing canonical forms (JCS-RFC8785 or equivalent) MUST lexicographically (byte-wise) sort the array before hashing.
ext:reservation. Theext:namespace is permanently reserved for vendor-internal frameworks. MUST NOT be assigned as a formal spec namespace in any future version.
Namespace list (closed enum, v0.2.0)
Pharma / international: gxp, ich, iso, iec, imdrf, nist, iso27001, iso_42001, iso_23894, iso_14971, iec_62304, nist_sp800_53, gamp5, mdcg.
EU regulatory: eu_mdr, eu_ivdr, eu_ai_act, eu_eba, eu_ecb, eu_dora, eu_csrd, eu_cs3d, eu_eudr, eu_battery_regulation, eu_green_claims, eu_nis2, eu_cra, eu_data_act, eu_dsa, eu_dma, eu_mica, eu_gdpr, eu_sfdr.
US regulatory: us_fda_21cfr_803, us_fda_21cfr_820, us_fda_21cfr_11, us_fda_samd, us_hhs_hipaa, us_frb_sr_11_7.
UK + other jurisdictions: uk_mhra, jp_pmda, ch_swissmedic, ca_health_canada, au_tga. The non-UK jurisdictions ship in v0.2.0 with no enumerated recommended tokens — defensive future-proofing for global expansion.
Vendor escape hatch: ext. Vendors with bespoke internal frameworks prefix with ext:. Convention recommends ext:<vendor>_<token> or ext:<vendor>.<token> to avoid cross-vendor collisions; the schema does not enforce this.
New namespaces ship via 14-day PR per GOVERNANCE.md. New tokens under existing namespaces require no spec change.
Recommended tokens by namespace (non-normative, v0.2.0 baseline)
The recommended-token list for v0.2.0 will be published as a recommended-tokens.md in this repository in a follow-up patch release. Producers MAY emit unrecognised tokens under known namespaces; the corpus-layer advisory mechanism handles them.
Semantic distinction: regulatory_domains vs missing_controls.control
Both fields share the lexical CURIE shape but answer different questions:
- A CURIE in
regulatory_domains_self_declaredorregulatory_domains_identifiedanswers what regulatory framework or scope applies to this workflow? Example:eu_ai_act:annex_iii_para_5_sub_b_creditworthiness. - A CURIE in
missing_controls[].controlanswers what specific control is absent? Example:iso27001:A.8.1.1.
Tokens identify frameworks/scopes in one field and individual controls in the other. Producers SHOULD pick the namespace that matches the semantic level; mixed usage is not structurally invalid but is semantically incoherent and SHOULD be flagged by tooling.
New: pif_corpus_version envelope field
The semi-closed CURIE design intentionally separates structural validity from corpus recognition. Two producers running against different corpus snapshots will emit structurally-valid-but-token-disagreeing PIF documents. Without a corpus-version declaration, consumers cannot reproduce "unrecognised token" advisories deterministically.
v0.2 adds an OPTIONAL pif_corpus_version envelope field on PreflightAssertion:
{
"pif_version": "0.2",
"pif_corpus_version": "preclari-eu-gmp-2026-w20",
"assertion_id": "...",
...
}
Format: opaque string (maxLength 200). Producers SHOULD use a stable identifier mapping to a specific corpus snapshot (e.g., <corpus-name>-<date>-<revision> or a content hash).
Producers emitting any token outside the v0.2.0 recommended-token list SHOULD populate pif_corpus_version. Producers emitting only recommended tokens MAY omit it.
Consumers MUST NOT interpret the string. The field is for audit trail and reproducibility only.
New: conditional_applicability on applicable_requirements[]
v0.2 adds an OPTIONAL boolean field on each entry in applicable_requirements[]:
"applicable_requirements": [
{
"requirement_id": "req_001",
"requirement_text": "...",
"source": { ... },
"applicability_basis": "...",
"confidence": "high",
"conditional_applicability": true
}
]
When true, signals that this requirement's applicability is CONDITIONAL on one or more assumptions in assumptions_made resolving in a specific direction. When false or absent, the requirement is unconditionally applicable given the workflow's declared attributes.
The field decouples assumption-driven retrieval from precision torpedoing: a preflight that surfaces a requirement on a hedged assumption can flag the requirement as conditional without weakening its visibility. Consumers SHOULD treat conditional_applicability: true requirements as belonging to a "verify the assumption first" branch rather than the directly-applicable branch.
The cross-reference to specific assumptions is not formalised in v0.2 — the conditional flag signals "check assumptions_made" and consumers traverse that array for context. v0.3+ may add a structured related_assumptions: ["assumption_id"] link if assumptions_made gains stable IDs.
Tightened required array (PreflightAssertion)
v0.1 spec.md prose (../v0.1/spec.md) stated: "Honesty by architecture. Required fields like assumptions_made, clarifying_questions, out_of_scope, and applicability_basis make uncertainty and limits structural, not optional disclaimer text."
The v0.1 schema's required array did not include out_of_scope or clarifying_questions. Eval evidence (n=1) showed a real v0.1 assertion lacking out_of_scope passed conformance because of this schema-vs-prose drift.
v0.2 closes the drift. The v0.2 schema's required array adds out_of_scope and clarifying_questions — but only when pif_version: "0.2". v0.1 documents (pif_version: "0.1") continue to validate without those fields (cumulative validation preserves backward compat).
Mechanism: the v0.2 schema uses a JSON Schema if/then construct keyed on pif_version:
{
"required": [...v0.1 set...],
"allOf": [
{
"if": { "properties": { "pif_version": { "const": "0.2" } }, "required": ["pif_version"] },
"then": { "required": ["out_of_scope", "clarifying_questions"] }
}
]
}
A v0.2 producer that omits either field fails schema validation; a v0.1 document that omits them validates fine under cumulative validation.
Migration from v0.1
The canonical machine-readable mapping is at migrations/v0.1-to-v0.2.json. Summary:
Field renames:
gxp_domains_self_declared→regulatory_domains_self_declared(WD)gxp_domains_identified→regulatory_domains_identified(PA)
Value translation:
- Old GxP enum values get the
gxp:prefix:"GMP"→"gxp:GMP", etc. "none_claimed"carries forward unchanged (reserved literal).
Backward compatibility:
- v0.1 documents validate against v0.1 schemas indefinitely.
- v0.1 documents ALSO validate against v0.2 schemas (cumulative validation).
- v0.2 documents do NOT validate against v0.1 schemas (new fields, new required entries — v0.1 schemas reject them).
Deprecation timeline:
- v0.2:
gxp_domains_*deprecated (schema-leveldeprecated: true); v0.2 producers MUST NOT emit them; consumers MUST accept them from v0.1 documents. - v0.3: still accepted (six-month minimum migration window for downstream corpus consumers per the v0.2 corpus retention rule).
- v1.0: removed.
missing_controls[].controlfree-text shape: DEPRECATED in v0.2; REMOVED in v0.3.
Schema URLs:
- v0.1 schemas:
https://pif-spec.github.io/pif/v0.1/.... - v0.2 schemas:
https://pif-spec.github.io/pif/v0.2/....
Schema URLs
The v0.2 JSON Schemas are published at:
https://pif-spec.github.io/pif/v0.2/workflow-description.schema.jsonhttps://pif-spec.github.io/pif/v0.2/preflight-assertion.schema.jsonhttps://pif-spec.github.io/pif/v0.2/preflight-session.schema.json
JSON-LD context:
https://pif-spec.github.io/pif/v0.2/context.jsonld
$id matches the public URL for each schema; consumers can use the URLs in $ref from their own schemas.
What's not in v0.2
- A formal
regulatory_basisarray on eachapplicable_requirements[]entry — deferred to v0.2.1. - Removal of the
gxp_domains_*deprecated fields — deferred to v1.0. - Removal of
missing_controls[].controlfree-text branch — deferred to v0.3. - A jurisdictional override mechanism — v0.1 carry-forward.
- A digital ledger / transparency log for signed assertions — v0.1 carry-forward.
- Multi-language fields for
intentand other free-text — v0.1 carry-forward. - New artifact types beyond the three v0.1 defined.
See also
../v0.1/spec.md— full v0.1 spec; canonical reference for concepts, lifecycle, examples, signatures, JSON-LD background.migrations/v0.1-to-v0.2.json— machine-readable v0.1 → v0.2 mapping.conformance/README.md— v0.2 conformance test suite.../GOVERNANCE.md— spec governance, change types, deprecation policy.