Records

EskerModel — the central record contract. Draft vs published, ClassVars vs fields.

EskerModel is the base class for every domain record in Esker. It's a Pydantic v2 BaseModel with two strong opinions: records are values, not objects (frozen=True), and unknown fields fail loudly (extra="forbid").

If you use the decorator, you never write EskerModel directly — it's synthesized for you. If you reach for the three-class form, you write the subclass explicitly.

Shape

from typing import ClassVar
from pydantic import BaseModel, ConfigDict


class EskerModel(BaseModel):
    model_config = ConfigDict(frozen=True, extra="forbid")

    DOMAIN_ID: ClassVar[str]
    schema_version: ClassVar[str]

Two ClassVars. Note: ClassVars, not Pydantic fields. This is the most-flagged footgun in the codebase — see below.

Subclassing

from typing import Annotated, ClassVar
from datetime import date
from pydantic import Field
from esker import EskerModel


class TreasuryYieldCurve(EskerModel):
    DOMAIN_ID: ClassVar[str] = "us.treasury.yields"
    schema_version: ClassVar[str] = "2.0.0"

    quote_date: date
    rate_1m: float | None = None
    rate_3m: float | None = None
    rate_6m: float | None = None
    rate_1y: float | None = None

__init_subclass__ enforces:

  • DOMAIN_ID must be in cls.__dict__ (or inherited from a non-EskerModel base).
  • DOMAIN_ID must match ^[a-z0-9]+(\.[a-z0-9]+)+$ (lowercase, dot-separated, ≥ 2 segments).
  • Missing DOMAIN_ID raises TypeError("X must declare a DOMAIN_ID class variable").

schema_version is read by cls.declared_version(). Missing → ValueError("X must declare a schema_version class variable").

ClassVar vs Pydantic field

The mistake:

# WRONG — turns schema_version into a per-record field
class Bad(EskerModel):
    DOMAIN_ID: ClassVar[str] = "us.bad"
    schema_version: SemVer = "1.0.0"

The schema_version: SemVer = ... form silently:

  1. Inflates parquet by len(version) * record_count bytes.
  2. Breaks cls.declared_version() (returns a FieldInfo, not the string).
  3. Pollutes the hash-stable JSON Schema.

Always use the ClassVar form:

# RIGHT
class Good(EskerModel):
    DOMAIN_ID: ClassVar[str] = "us.good"
    schema_version: ClassVar[str] = "1.0.0"

The decorator path bypasses this trap entirely — it sets these as raw class attributes on a synthesized base, so the inherited ClassVar annotation wins.

Three classmethods to use

@classmethod
def declared_version(cls) -> str: ...        # the schema_version string
@classmethod
def domain(cls) -> str: ...                  # "<DOMAIN_ID>@<version>"
@classmethod
def json_schema(cls) -> dict: ...            # cls.model_json_schema()

Use these instead of model_fields["schema_version"].default or model_json_schema() directly. They centralize Pydantic-internals access.

Draft vs Published

EskerModel is the draft shape — only your domain fields. The published shape adds three injected fields:

@classmethod
def published(cls) -> type["EskerModel"]: ...

Generated and process-cached. Returns a class named Published<X> with three extra fields:

field type meaning
esker_id EskerID (string) join key — same entity → same id across datasets
esker_source_url HttpUrl per-record canonical URL
esker_lineage_id UUID links to a row in lineage.json

Only EskerPipeline.run() calls .published()(...) to mint records. User code should not. The TypeScript renderer strips the Published prefix so consumers see <X>.

Injected fields

The three esker_*-prefixed fields are synthesized at run time by the framework — never set them yourself.

field source
esker_id f-string: esker:<jurisdiction>:<entity_type>:<native_id>
esker_source_url _SOURCE_URL_TEMPLATE.format(**fields) or Fetched.source_url
esker_lineage_id UUID per (source_url, fetched_at) batch

jurisdiction is the first segment of DOMAIN_ID. For us.sec.companies it's us.

The extra="forbid" config means trying to set esker_id on the draft raises:

ValidationError: 1 validation error for SecCompany
esker_id
  Extra inputs are not permitted [type=extra_forbidden, ...]

Good — you can't bypass the injection.

What's not a record field

These belong on the manifest, not on rows:

  • schema_version — repeated N times across rows is wasteful
  • source_id
  • ingested_at

Type recommendations

  • Annotated[str, Field(pattern=...)] for constrained strings (CIK, slug, ISO codes). Pydantic validates on construction; the parquet stores plain strings.
  • Literal["a", "b"] over Field(pattern=r"^(a|b)$") for enum-like fields. Pydantic emits Literal as JSON Schema enum, which the compat checker can set-diff (additions = minor, removals = major). A pattern emits as opaque text — any text change reads as breaking.
  • AwareDatetime, not naive datetime. Naive datetimes are a bug category Esker refuses to participate in.
  • date over datetime for date-only values. Lands in parquet as Arrow date32() — consumers don't have to parse strings.
  • float | None over Optional[float] for nullability — the JSON Schema is identical, the form reads cleaner.

See also

  • Pipelines — the decorator path that synthesizes EskerModel for you
  • Three-class form — when you write EskerModel directly
  • NamingDOMAIN_ID and esker_id patterns
  • Compatibility — how schema changes are classified