Lexicon evolution policy

Every lexicon revision in idiolect ships with an auto-derived, classified, verified, published lens. Hand-authored lenses are an escape hatch that requires governance sign-off. The same policy applies to vendored externals (Blacksky, layers-pub, ...).

The policy is the lexicon-level half of the project's stability story; the schema-level half is the stability and versioning note. The policy below is enforced by scripts/lexicon-evolve.sh and the release CI workflow.

The six stages

flowchart LR
    DIFF[0. Diff] --> DERIVE[1. Auto-derivation]
    DERIVE --> CLASSIFY[2. Classify]
    CLASSIFY --> COERCE[3. Coercion-law check]
    COERCE --> ROUNDTRIP[4. Roundtrip verify]
    ROUNDTRIP --> PUBLISH[5. Publish lens]

Each stage maps onto a panproto primitive; nothing is bespoke.

Stage 0 — Diff

schema diff --src lexicons/<nsid>.<old>.json --tgt lexicons/<nsid>.<new>.json

Produces a structured change graph: vertex / edge additions, removals, renames, kind coercions, constraint tightenings. Cached under migrations/<nsid>/<old>-<new>/diff.json.

Stage 1 — Auto-derivation

schema lens generate <old>.json <new>.json --hints <hints>.json

Produces a protolens chain: a sequence of dependent optics, each parameterized by a precondition over schemas. The elementary constructors are listed in Lens semantics. Hints declare anchors for ambiguous renames; forward-chaining propagates declared anchors into derived ones before the CSP solver runs.

Stage 2 — Classify

schema lens inspect chain.json --protocol atproto

Each chain receives one of five optic classes. The class drives the gate:

ClassGate behavior
IsoAuto-merge. No governance review.
InjectionAuto-merge as forward-only.
ProjectionPR review required. Complement persistence required.
AffinePR review plus a dev.idiolect.recommendation from a recognised reviewer.
GeneralManual lens authoring. Coercion-law check, verification gate, and recommendation all required.

The class also feeds dev.idiolect.dialect#deprecations: any non-Iso lens revision implicitly deprecates the previous schema and must populate deprecations with the lens at-uri as replacement.

Stage 3 — Coercion-law check

For any CoerceType step crossing primitive kinds:

schema theory check-coercion-laws theory.ncl --json

Sample-based; exit code is non-zero on any falsifying sample. The chain declares each CoerceType with an honest CoercionClass. Dishonest declarations corrupt the asymmetric-lens put law silently; this gate catches them.

Stage 4 — Roundtrip verification

schema lens verify <corpus>/ --protocol atproto --schema <new>.json --chain chain.json

Checks GetPut and PutGet over the corpus. The corpus is the live indexer's catalog snapshot at the time of revision: actual records published across the network for that NSID. CI fails on any record that violates either law. Verification is grounded in real data, not synthetic test cases.

Stage 5 — Publish

schema lens inspect chain.json --json | idiolect-cli publish-lens \
  --collection dev.panproto.schema.lens

The verified chain serializes to Nickel via panproto-lens-dsl and ships as a dev.panproto.schema.lens record from idiolect's DID. The lens at-uri is added to:

  • The new lexicon revision's dev.idiolect.dialect#preferredLenses.
  • The previous lexicon's deprecations block as replacement.

Codegen re-runs on revision bump; downstream consumers pulling the dialect record see the lens automatically.

Vocab edits go through the same pipeline

Vocab edits are record edits, not schema edits, but the same six stages apply via dependent optics:

EditClassAction
Add node, add edge(no migration needed)Open enums tolerate. Stage 4 corpus regression only.
Remove nodeProjectionFull pipeline.
Rename nodeIsoRenameEdgeName over consumers. Auto-merge.
Add equivalent_to between vocabs(free)Triggers a derived lens automatically lifted into the orchestrator's mapEnum cache.

Why this is modular

  • Modular. Each elementary protolens is a stand-alone, well-typed combinator. The pipeline composes them; no bespoke migration code per revision.
  • Abstract. Protolenses are quantified over schemas, not specific to a revision pair. RenameField("oldName", "newName") is a schema-parametric morphism; it applies to the lexicon and to every record across the network without per-record code.
  • Composable. Chain auto-simplification, ScopedTransform sub-chains, Nickel record merge for fragments, symmetric lenses for forward / backward pairing, lift across protocols via theory morphisms.
  • Verifiable. Optic classification is mechanical. Coercion-law checks are sample-based. Corpus regression uses real records. Trust comes from the substrate, not from review prose.
  • Decentralized. Communities author their own protolens chains for their own lexicons. The policy applies to anyone adopting idiolect's framework.

Vendored externals

Vendored lexicons (Blacksky, layers-pub, ...) are consumed as schemas idiolect does not own. When they revise, the same six stages run against their old / new pair. The output is a symmetric lens (per panproto's symmetricLens): syncing A → B and B → A keeps both sides consistent up to complement. This is the right shape for a bridge crate (e.g. the planned idiolect-acorn): when the upstream changes, the bridge auto- updates and downstream idiolect records remain syncable.

Tooling

  • scripts/lexicon-evolve.sh <nsid> <old> <new> runs stages 0–5 in sequence.
  • .github/workflows/lexicon-evolution.yml runs stages 3–4 on PR; stage 5 on tagged release.
  • A pre-commit hook runs stages 0–2 locally on every lexicon edit.
  • migrations/<nsid>/<old>-<new>/{diff.json, chain.ncl, hints.json, classification.json, verification.json} is the per-revision audit trail.

The policy is what makes lexicon evolution reviewable. The tooling is what makes the policy cheap to follow.