Records
EskerModel — the central record contract. Draft vs published, ClassVars vs fields.
EskerModel is the base class for every domain record in Esker. It's a Pydantic v2 BaseModel with two strong opinions: records are values, not objects (frozen=True), and unknown fields fail loudly (extra="forbid").
If you use the decorator, you never write EskerModel directly — it's synthesized for you. If you reach for the three-class form, you write the subclass explicitly.
Shape
from typing import ClassVar
from pydantic import BaseModel, ConfigDict
class EskerModel(BaseModel):
model_config = ConfigDict(frozen=True, extra="forbid")
DOMAIN_ID: ClassVar[str]
schema_version: ClassVar[str]
Two ClassVars. Note: ClassVars, not Pydantic fields. This is the most-flagged footgun in the codebase — see below.
Subclassing
from typing import Annotated, ClassVar
from datetime import date
from pydantic import Field
from esker import EskerModel
class TreasuryYieldCurve(EskerModel):
DOMAIN_ID: ClassVar[str] = "us.treasury.yields"
schema_version: ClassVar[str] = "2.0.0"
quote_date: date
rate_1m: float | None = None
rate_3m: float | None = None
rate_6m: float | None = None
rate_1y: float | None = None
__init_subclass__ enforces:
DOMAIN_IDmust be incls.__dict__(or inherited from a non-EskerModelbase).DOMAIN_IDmust match^[a-z0-9]+(\.[a-z0-9]+)+$(lowercase, dot-separated, ≥ 2 segments).- Missing
DOMAIN_IDraisesTypeError("X must declare a DOMAIN_ID class variable").
schema_version is read by cls.declared_version(). Missing → ValueError("X must declare a schema_version class variable").
ClassVar vs Pydantic field
The mistake:
# WRONG — turns schema_version into a per-record field
class Bad(EskerModel):
DOMAIN_ID: ClassVar[str] = "us.bad"
schema_version: SemVer = "1.0.0"
The schema_version: SemVer = ... form silently:
- Inflates parquet by
len(version) * record_countbytes. - Breaks
cls.declared_version()(returns aFieldInfo, not the string). - Pollutes the hash-stable JSON Schema.
Always use the ClassVar form:
# RIGHT
class Good(EskerModel):
DOMAIN_ID: ClassVar[str] = "us.good"
schema_version: ClassVar[str] = "1.0.0"
The decorator path bypasses this trap entirely — it sets these as raw class attributes on a synthesized base, so the inherited ClassVar annotation wins.
Three classmethods to use
@classmethod
def declared_version(cls) -> str: ... # the schema_version string
@classmethod
def domain(cls) -> str: ... # "<DOMAIN_ID>@<version>"
@classmethod
def json_schema(cls) -> dict: ... # cls.model_json_schema()
Use these instead of model_fields["schema_version"].default or model_json_schema() directly. They centralize Pydantic-internals access.
Draft vs Published
EskerModel is the draft shape — only your domain fields. The published shape adds three injected fields:
@classmethod
def published(cls) -> type["EskerModel"]: ...
Generated and process-cached. Returns a class named Published<X> with three extra fields:
| field | type | meaning |
|---|---|---|
esker_id |
EskerID (string) |
join key — same entity → same id across datasets |
esker_source_url |
HttpUrl |
per-record canonical URL |
esker_lineage_id |
UUID |
links to a row in lineage.json |
Only EskerPipeline.run() calls .published()(...) to mint records. User code should not. The TypeScript renderer strips the Published prefix so consumers see <X>.
Injected fields
The three esker_*-prefixed fields are synthesized at run time by the framework — never set them yourself.
| field | source |
|---|---|
esker_id |
f-string: esker:<jurisdiction>:<entity_type>:<native_id> |
esker_source_url |
_SOURCE_URL_TEMPLATE.format(**fields) or Fetched.source_url |
esker_lineage_id |
UUID per (source_url, fetched_at) batch |
jurisdiction is the first segment of DOMAIN_ID. For us.sec.companies it's us.
The extra="forbid" config means trying to set esker_id on the draft raises:
ValidationError: 1 validation error for SecCompany
esker_id
Extra inputs are not permitted [type=extra_forbidden, ...]
Good — you can't bypass the injection.
What's not a record field
These belong on the manifest, not on rows:
schema_version— repeated N times across rows is wastefulsource_idingested_at
Type recommendations
Annotated[str, Field(pattern=...)]for constrained strings (CIK, slug, ISO codes). Pydantic validates on construction; the parquet stores plain strings.Literal["a", "b"]overField(pattern=r"^(a|b)$")for enum-like fields. Pydantic emitsLiteralas JSON Schemaenum, which the compat checker can set-diff (additions = minor, removals = major). A pattern emits as opaque text — any text change reads as breaking.AwareDatetime, not naivedatetime. Naive datetimes are a bug category Esker refuses to participate in.dateoverdatetimefor date-only values. Lands in parquet as Arrowdate32()— consumers don't have to parse strings.float | NoneoverOptional[float]for nullability — the JSON Schema is identical, the form reads cleaner.
See also
- Pipelines — the decorator path that synthesizes
EskerModelfor you - Three-class form — when you write
EskerModeldirectly - Naming —
DOMAIN_IDandesker_idpatterns - Compatibility — how schema changes are classified