Compatibility

How schema changes are classified and what bumps are required.

A published schema is consumer-facing. Silently changing it would break consumers, so Esker has a compat engine that diffs the proposed schema against the last published one and decides what's allowed.

The engine lives in esker.schemas.compat (pure functions, no I/O); the I/O wrapping for the push gate lives in esker.client.compat. Both esker check and esker push route through the same logic.

What the engine does

For every push at a non-major version bump, the engine:

  1. Fetches the prior schema.json from the hub.
  2. Normalizes both schemas: inlines $ref from $defs, strips doc-only keys (title, description, default, $defs, examples), canonicalizes shape variants.
  3. Walks properties and required recursively, classifying each change as breaking or additive.
  4. Decides the minimum required SemVer bump.

The output is a CompatReport:

@dataclass
class CompatReport:
    breaking: list[str]
    additive: list[str]
    required_bump: Literal["patch", "minor", "major"]

    @property
    def compatible(self) -> bool:
        return not self.breaking

Mapping:

  • Any breakingrequired_bump = "major".
  • Only additiverequired_bump = "minor".
  • Nothing → required_bump = "patch".

Classification rules

For each field, in order:

  1. In old, not in new → breaking (field 'X' removed).
  2. In new, not in old → additive if optional, breaking if required (field 'X' added [as required]).
  3. Required toggle: optional → required is breaking. Required → optional is additive.
  4. Type signature mismatch → breaking (field 'X': string → integer).
  5. Same object type → recurse into the nested schema.
  6. Same array<object> type → recurse into items.
  7. anyOf on both sides → pair the unique object/array members and recurse.
  8. Otherwise → constraint diff.

Constraint diff

change classification
enum value removed breaking
enum value added additive
enum keyword added/removed breaking
pattern changed (any way) breaking
format changed (any way) breaking
minLength / minimum tightened breaking
minLength / minimum relaxed additive
maxLength / maximum tightened breaking
maxLength / maximum relaxed additive

Pattern and format changes are always breaking, even if the new pattern is strictly looser. The engine doesn't statically analyze regex sizing.

What's not diffed

The engine short-circuits in three cases. Each case sets a skip_reason instead of running the diff.

skip reason meaning
first_publish No prior manifest on the hub. Allowed.
major_skip Major version bump. Different schema is expected; no diff to do.
grandfather Prior manifest exists but no schema.json artifact. Common with old datasets. Allowed.

Doc-only fields are also ignored: title, description, default, $defs, examples. So changing a description or a default value is invisible to compat.

Literal[X] collapses to {"const": X} in Pydantic's emit; Literal[X, Y] becomes {"enum": [X, Y]}. The diff special-cases enum, so transitions between single-element and multi-element literals can render with a slightly less clear message ("enum keyword added or removed") even though the classification is correct.

The push gate

esker push calls enforce(...), which raises CompatError when the push should be blocked.

Decision tree:

  • first_publish → allowed.
  • Same version (declared_bump == "none"):
    • If no prior schema.json → allowed (first artifact for this version).
    • If schema unchanged → allowed.
    • If schema changed at all → blocked with <owner>/<name>@<v> already published with a different schema; bump schema_version.
  • Major bump:
    • With --force-major → allowed.
    • Without → blocked with major bump <a> → <b> requires --force-major.
  • grandfather → allowed.
  • Patch / minor:
    • Declared bump covers required → allowed.
    • Required bump exceeds declared → blocked with required bump: <required> and the breaking changes listed.

A green esker check ≈ "push won't be blocked by compat." Both call the same diagnose function.

Same-version re-publish

A push at the same schema_version is only allowed if the schema hasn't changed at all (no breaking, no additive). Re-publishing v1.0.0 with a tweaked field type fails with the "already published with a different schema" error. Bump the version.

The exception is "no schema.json on hub for that version yet" (grandfather-like): then re-publish lands as the first schema artifact for that version.

See also

  • Publishingesker check and esker push walkthroughs
  • Manifests — what schema_version is and where it lives
  • RecordsLiteral vs pattern for enum-like fields