Open enums and vocabularies

Closed enums are the wrong default for a federated lexicon. Adding a value should not require coordinating with every consumer; refusing to accept a new value should not be the default.

idiolect's policy is: every enum-shaped field is open, and the extension story is mechanical.

The wire shape

An open-enum field carries knownValues and a sibling *Vocab reference:

"kind": {
  "type": "string",
  "knownValues": ["subprocess", "http", "wasm"],
  "description": "Slug; resolves as a node in the vocab referenced by `kindVocab`."
},
"kindVocab": {
  "type": "ref",
  "ref": "dev.idiolect.defs#vocabRef",
  "description": "Vocabulary record whose nodes constitute the open extension."
}

kindVocab is optional. When omitted, the canonical idiolect-published vocab for that field is the implicit default. A community-published vocab listed here extends the slugs the field accepts.

The codegen shape

idiolect-codegen reads knownValues and emits:

#![allow(unused)]
fn main() {
#[derive(Clone, Debug, Eq, PartialEq)]
pub enum Kind {
    Subprocess,
    Http,
    Wasm,
    Other(String),
}
}

Plus hand-written Serialize / Deserialize impls that round-trip unknown slugs through Other(String). The TypeScript half emits 'a' | 'b' | (string & {}) for the same purpose.

Three helper methods sit on every emitted open-enum type:

#![allow(unused)]
fn main() {
impl Kind {
    pub fn is_subsumed_by(
        &self,
        graph: &VocabGraph,
        ancestor: &str,
    ) -> bool { /* ... */ }

    pub fn satisfies(
        &self,
        graph: &VocabGraph,
        relation: &str,
        target: &str,
    ) -> bool { /* ... */ }

    pub fn translate_to<T: From<String>>(
        &self,
        src_vocab_uri: &str,
        tgt_vocab_uri: &str,
        registry: &VocabRegistry,
    ) -> Option<T> { /* ... */ }
}
}

These are what consumers call instead of comparing strings. A consumer asking "is this kind a subprocess?" calls k.is_subsumed_by(&vocab, "subprocess") and gets true for any slug the vocab declares as subsumed_by subprocess (docker-run, fly-machines-launch, etc.) without changing the consumer's code.

Why this shape

Closed enums force a coordination problem: adding a value requires every consumer to upgrade their code before any producer publishes the new value. Open enums turn it into a vocabulary problem: the producer publishes the vocab, the consumer queries the vocab at runtime, and unknown slugs degrade gracefully to Other(String) when the consumer has not loaded the vocab.

The cost is one extra indirection per slug interpretation. The shipped VocabRegistry caches vocabs by at-uri, so the cost is amortized across the process lifetime.

What stays closed

A few fields are intentionally closed. They are meta-policy fields where extending the value space would change the runtime's contract, not the data:

  • vocab.world (open / closed-with-default / hierarchy-closed) controls the runtime's open-enum policy itself.
  • lensClass (isomorphism / injection / projection / affine / general) is a panproto contract; extending it changes what the runtime promises.
  • recordHosting (member-hosted / community-hosted / hybrid) controls a federation policy.

A new value here is a runtime change, not a record change.

Migration

Converting a closed enum to an open enum is wire-compatible: existing records continue to validate, and the codegen-emitted helpers degrade to "if Other, ignore" in consumers that have not regenerated. Going the other way is breaking; the shipped lexicons do not do that.

Codegen identifier collisions

When two distinct slugs would pascal-case to the same Rust identifier (foo-bar and foo_bar), the second occurrence gets a numeric suffix (FooBar2). The collision is resolved deterministically per lexicon, so two regenerations of the same lexicon produce the same identifier names. The collision report is printed at codegen time so authors can rename a slug when the generated name is awkward.