# Records

> EskerModel — the central record contract. Draft vs published, ClassVars vs fields.

`EskerModel` is the base class for every domain record in Esker. It's a Pydantic v2 `BaseModel` with two strong opinions: records are values, not objects (`frozen=True`), and unknown fields fail loudly (`extra="forbid"`).

If you use the [decorator](https://esker.so/docs/sdk/pipelines.md), you never write `EskerModel` directly — it's synthesized for you. If you reach for the [three-class form](https://esker.so/docs/sdk/three-class-form.md), you write the subclass explicitly.

## Shape

```python
from typing import ClassVar
from pydantic import BaseModel, ConfigDict


class EskerModel(BaseModel):
    model_config = ConfigDict(frozen=True, extra="forbid")

    DOMAIN_ID: ClassVar[str]
    schema_version: ClassVar[str]
```

Two `ClassVar`s. Note: **`ClassVar`s, not Pydantic fields.** This is the most-flagged footgun in the codebase — see below.

## Subclassing

```python
from typing import Annotated, ClassVar
from datetime import date
from pydantic import Field
from esker import EskerModel


class TreasuryYieldCurve(EskerModel):
    DOMAIN_ID: ClassVar[str] = "us.treasury.yields"
    schema_version: ClassVar[str] = "2.0.0"

    quote_date: date
    rate_1m: float | None = None
    rate_3m: float | None = None
    rate_6m: float | None = None
    rate_1y: float | None = None
```

`__init_subclass__` enforces:

- `DOMAIN_ID` must be in `cls.__dict__` (or inherited from a non-`EskerModel` base).
- `DOMAIN_ID` must match `^[a-z0-9]+(\.[a-z0-9]+)+$` (lowercase, dot-separated, ≥ 2 segments).
- Missing `DOMAIN_ID` raises `TypeError("X must declare a DOMAIN_ID class variable")`.

`schema_version` is read by `cls.declared_version()`. Missing → `ValueError("X must declare a schema_version class variable")`.

## ClassVar vs Pydantic field

:::warn
The single most common mistake in new pipelines. Read this section even if you're using the decorator — it's worth knowing why the `ClassVar` form matters.
:::

The mistake:

```python
# WRONG — turns schema_version into a per-record field
class Bad(EskerModel):
    DOMAIN_ID: ClassVar[str] = "us.bad"
    schema_version: SemVer = "1.0.0"
```

The `schema_version: SemVer = ...` form silently:

1. Inflates parquet by `len(version) * record_count` bytes.
2. Breaks `cls.declared_version()` (returns a `FieldInfo`, not the string).
3. Pollutes the hash-stable JSON Schema.

Always use the `ClassVar` form:

```python
# RIGHT
class Good(EskerModel):
    DOMAIN_ID: ClassVar[str] = "us.good"
    schema_version: ClassVar[str] = "1.0.0"
```

The decorator path bypasses this trap entirely — it sets these as raw class attributes on a synthesized base, so the inherited `ClassVar` annotation wins.

## Three classmethods to use

```python
@classmethod
def declared_version(cls) -> str: ...        # the schema_version string
@classmethod
def domain(cls) -> str: ...                  # "<DOMAIN_ID>@<version>"
@classmethod
def json_schema(cls) -> dict: ...            # cls.model_json_schema()
```

Use these instead of `model_fields["schema_version"].default` or `model_json_schema()` directly. They centralize Pydantic-internals access.

## Draft vs Published

`EskerModel` is the **draft** shape — only your domain fields. The published shape adds three injected fields:

```python
@classmethod
def published(cls) -> type["EskerModel"]: ...
```

Generated and process-cached. Returns a class named `Published<X>` with three extra fields:

| field              | type               | meaning                                          |
| ------------------ | ------------------ | ------------------------------------------------ |
| `esker_id`         | `EskerID` (string) | join key — same entity → same id across datasets |
| `esker_source_url` | `HttpUrl`          | per-record canonical URL                         |
| `esker_lineage_id` | `UUID`             | links to a row in `lineage.json`                 |

Only `EskerPipeline.run()` calls `.published()(...)` to mint records. User code should not. The TypeScript renderer strips the `Published` prefix so consumers see `<X>`.

## Injected fields

The three `esker_*`-prefixed fields are **synthesized at run time** by the framework — never set them yourself.

| field              | source                                                          |
| ------------------ | --------------------------------------------------------------- |
| `esker_id`         | f-string: `esker:<jurisdiction>:<entity_type>:<native_id>`      |
| `esker_source_url` | `_SOURCE_URL_TEMPLATE.format(**fields)` or `Fetched.source_url` |
| `esker_lineage_id` | UUID per `(source_url, fetched_at)` batch                       |

`jurisdiction` is the first segment of `DOMAIN_ID`. For `us.sec.companies` it's `us`.

The `extra="forbid"` config means trying to set `esker_id` on the draft raises:

```
ValidationError: 1 validation error for SecCompany
esker_id
  Extra inputs are not permitted [type=extra_forbidden, ...]
```

Good — you can't bypass the injection.

## What's not a record field

These belong on the [manifest](https://esker.so/docs/protocol/manifests.md), not on rows:

- `schema_version` — repeated N times across rows is wasteful
- `source_id`
- `ingested_at`

## Type recommendations

- **`Annotated[str, Field(pattern=...)]`** for constrained strings (CIK, slug, ISO codes). Pydantic validates on construction; the parquet stores plain strings.
- **`Literal["a", "b"]` over `Field(pattern=r"^(a|b)$")`** for enum-like fields. Pydantic emits `Literal` as JSON Schema `enum`, which the [compat checker](https://esker.so/docs/protocol/compatibility.md) can set-diff (additions = minor, removals = major). A pattern emits as opaque text — any text change reads as breaking.
- **`AwareDatetime`, not naive `datetime`.** Naive datetimes are a bug category Esker refuses to participate in.
- **`date` over `datetime` for date-only values.** Lands in parquet as Arrow `date32()` — consumers don't have to parse strings.
- **`float | None` over `Optional[float]`** for nullability — the JSON Schema is identical, the form reads cleaner.

## See also

- [Pipelines](https://esker.so/docs/sdk/pipelines.md) — the decorator path that synthesizes `EskerModel` for you
- [Three-class form](https://esker.so/docs/sdk/three-class-form.md) — when you write `EskerModel` directly
- [Naming](https://esker.so/docs/protocol/naming.md) — `DOMAIN_ID` and `esker_id` patterns
- [Compatibility](https://esker.so/docs/protocol/compatibility.md) — how schema changes are classified
