# Compatibility

> How schema changes are classified and what bumps are required.

A published schema is consumer-facing. Silently changing it would break consumers, so Esker has a compat engine that diffs the proposed schema against the last published one and decides what's allowed.

The engine lives in `esker.schemas.compat` (pure functions, no I/O); the I/O wrapping for the push gate lives in `esker.client.compat`. Both `esker check` and `esker push` route through the same logic.

## What the engine does

For every push at a non-major version bump, the engine:

1. Fetches the prior `schema.json` from the hub.
2. Normalizes both schemas: inlines `$ref` from `$defs`, strips doc-only keys (`title`, `description`, `default`, `$defs`, `examples`), canonicalizes shape variants.
3. Walks `properties` and `required` recursively, classifying each change as `breaking` or `additive`.
4. Decides the minimum required SemVer bump.

The output is a `CompatReport`:

```python
@dataclass
class CompatReport:
    breaking: list[str]
    additive: list[str]
    required_bump: Literal["patch", "minor", "major"]

    @property
    def compatible(self) -> bool:
        return not self.breaking
```

Mapping:

- Any `breaking` → `required_bump = "major"`.
- Only `additive` → `required_bump = "minor"`.
- Nothing → `required_bump = "patch"`.

## Classification rules

For each field, in order:

1. **In old, not in new** → breaking (`field 'X' removed`).
2. **In new, not in old** → additive if optional, breaking if required (`field 'X' added [as required]`).
3. **Required toggle**: optional → required is breaking. Required → optional is additive.
4. **Type signature mismatch** → breaking (`field 'X': string → integer`).
5. **Same `object` type** → recurse into the nested schema.
6. **Same `array<object>` type** → recurse into `items`.
7. **`anyOf` on both sides** → pair the unique object/array members and recurse.
8. **Otherwise** → constraint diff.

### Constraint diff

| change                            | classification |
| --------------------------------- | -------------- |
| enum value removed                | breaking       |
| enum value added                  | additive       |
| enum keyword added/removed        | breaking       |
| pattern changed (any way)         | breaking       |
| format changed (any way)          | breaking       |
| `minLength` / `minimum` tightened | breaking       |
| `minLength` / `minimum` relaxed   | additive       |
| `maxLength` / `maximum` tightened | breaking       |
| `maxLength` / `maximum` relaxed   | additive       |

Pattern and format changes are **always** breaking, even if the new pattern is strictly looser. The engine doesn't statically analyze regex sizing.

## What's not diffed

The engine short-circuits in three cases. Each case sets a `skip_reason` instead of running the diff.

| skip reason     | meaning                                                                                 |
| --------------- | --------------------------------------------------------------------------------------- |
| `first_publish` | No prior manifest on the hub. Allowed.                                                  |
| `major_skip`    | Major version bump. Different schema is expected; no diff to do.                        |
| `grandfather`   | Prior manifest exists but no `schema.json` artifact. Common with old datasets. Allowed. |

Doc-only fields are also ignored: `title`, `description`, `default`, `$defs`, `examples`. So changing a description or a default value is invisible to compat.

`Literal[X]` collapses to `{"const": X}` in Pydantic's emit; `Literal[X, Y]` becomes `{"enum": [X, Y]}`. The diff special-cases `enum`, so transitions between single-element and multi-element literals can render with a slightly less clear message ("enum keyword added or removed") even though the classification is correct.

## The push gate

`esker push` calls `enforce(...)`, which raises `CompatError` when the push should be blocked.

Decision tree:

- `first_publish` → allowed.
- Same version (`declared_bump == "none"`):
  - If no prior `schema.json` → allowed (first artifact for this version).
  - If schema unchanged → allowed.
  - If schema changed at all → blocked with `<owner>/<name>@<v> already published with a different schema; bump schema_version`.
- Major bump:
  - With `--force-major` → allowed.
  - Without → blocked with `major bump <a> → <b> requires --force-major`.
- `grandfather` → allowed.
- Patch / minor:
  - Declared bump covers required → allowed.
  - Required bump exceeds declared → blocked with `required bump: <required>` and the breaking changes listed.

A green `esker check` ≈ "push won't be blocked by compat." Both call the same `diagnose` function.

## Same-version re-publish

A push at the same `schema_version` is **only allowed** if the schema hasn't changed at all (no breaking, no additive). Re-publishing v1.0.0 with a tweaked field type fails with the "already published with a different schema" error. Bump the version.

The exception is "no `schema.json` on hub for that version yet" (grandfather-like): then re-publish lands as the first schema artifact for that version.

## See also

- [Publishing](https://esker.so/docs/sdk/publishing.md) — `esker check` and `esker push` walkthroughs
- [Manifests](https://esker.so/docs/protocol/manifests.md) — what `schema_version` is and where it lives
- [Records](https://esker.so/docs/sdk/records.md) — `Literal` vs `pattern` for enum-like fields
