# Glossary

> Esker terms with grounded definitions.

Definitions favor "what it actually is" over conceptual.

## binding

A bare name → `<owner>/<name>` mapping. Lives in `pyproject.toml [tool.esker.datasets]` (project-scoped) or `~/.esker/config.toml [datasets]` (global). Project bindings can be version-pinned via `esker.lock`. See [Bindings](https://esker.so/docs/sdk/bindings.md).

## cadence

A free-form string `ClassVar` on a pipeline indicating how often it should run (`"daily"`, `"weekly"`). **Not behavior** — Esker has no scheduler. Surfaced in `esker list` only.

## CompatReport

Output of the schema diff. A classification: `breaking`, `additive`, `required_bump`. See [Compatibility](https://esker.so/docs/protocol/compatibility.md).

## consumer

Anyone who reads a published dataset (via `esker.get`, `esker pull`, `esker view`, etc.). The opposite of a _publisher_.

## DatasetManifest

Sidecar metadata for a published dataset. One per run. The hub's source of truth for "what does this dataset look like at version V". POSTed to `/datasets/<owner>/<name>` on push. See [Manifests](https://esker.so/docs/protocol/manifests.md).

## DatasetRef

`<owner>/<name>[@<version>]`. Frozen value object. Versioned for write paths; versionless = "latest" for reads. Bare-name shortcut lives in `bindings.resolve`, not in the parser. Internal — users pass strings.

## DOMAIN_ID

The bare name of a dataset. Lowercase a–z0–9 dot-separated, ≥ 2 segments. First segment is the **jurisdiction**. Examples: `us.sec.companies`, `ca.corporations.registry`. Pattern: `^[a-z0-9]+(\.[a-z0-9]+)+$`.

## EskerEntity

The "what is this thing" record (entity-resolver concept). Carries `esker_id`, `entity_type`, `jurisdiction`, `name`, `discovered_at`. Currently not consumed by any path; deferred to a future release.

## esker_id

The cross-dataset join key on a single conceptual entity. Format: `esker:<jurisdiction>:<entity_type>:<native_id>`. Two records about the same corporation share one `esker_id`.

Built at run time by `EskerPipeline.run()`.

## esker_lineage_id

Per-record reference to a lineage batch (a unique `(source_url, fetched_at)` pair). UUID, stored as a column in parquet, joined against `<domain>.lineage.json`'s `batches[*].lineage_id`.

## EskerModel

Pydantic base for every domain record. `frozen=True, extra="forbid"`. Subclass declares `DOMAIN_ID` and `schema_version` as `ClassVar`s. Variants: draft (no `esker_*`) and published (with the three injected fields, via `.published()`). See [Records](https://esker.so/docs/sdk/records.md).

## EskerPipeline

ABC binding a `SOURCE` to a `SCHEMA` via a `transform`. Implementing class declares 6 ClassVars. `.run(output_dir)` produces a `RunResult` and writes parquet + lineage. See [Three-class form](https://esker.so/docs/sdk/three-class-form.md).

## EskerSource

ABC for raw data origins. Yields `Fetched` envelopes via `fetch_all()` (and optionally per-id `fetch(id)`). Provides per-id caching via `fetch_cached`. Subclasses declare `SOURCE_ID`. See [Sources](https://esker.so/docs/sdk/sources.md).

## entity_type

Lowercase-letters-only token identifying the kind of entity a record describes (`corp`, `rocket`, `curve`). Embedded in `esker_id`.

## Fetched

A `(raw, source_url, fetched_at)` envelope yielded by sources. The thing pipelines transform.

## fixture

A `(raw_<label>.json, expected_<label>.json)` pair under a fixture dir. The harness diffs `pipeline.transform(raw).model_dump(mode="json")` against `expected`. See [Fixtures](https://esker.so/docs/sdk/fixtures.md).

## hub

The commercial half of Esker — `esker.so`. The SDK talks to it over HTTP via `esker.client.hub`. Default `localhost:3001`.

## jurisdiction

First dot-segment of `DOMAIN_ID`. Coded as 2+ lowercase letters. Used to construct `esker_id`. Conventional values: `us`, `ca`, `global`, `un`, `eu`.

## key (or `_KEY`)

The field on the model holding the native identifier (e.g. `"cik"` for SEC, `"rocket_id"` for SpaceX). Used by `pipeline.run()` to extract `native_id` and build `esker_id`.

## LineageBatch / LineageBundle

`LineageBatch`: a unique `(source_url, fetched_at, lineage_id)` triple. `LineageBundle`: the manifest_version + run_id + list of batches written as `<domain>.lineage.json`. See [Lineage](https://esker.so/docs/protocol/lineage.md).

## owner / OwnerHandle

A publishing namespace. GitHub-style handle (1–39 chars, lowercase a–z0–9 with non-doubled hyphens). Single global namespace shared by users and orgs. Reserved words blocked. See [Handles](https://esker.so/docs/protocol/handles.md).

## pipeline (decorator)

`@pipeline("<domain>@<semver>", entity_type=, key=, url=|source=)` — the shorthand that synthesizes the three classes (Source, Schema, Pipeline) from a plain class with annotations and a `transform` classmethod. The 80% authoring path. See [Pipelines](https://esker.so/docs/sdk/pipelines.md).

## published variant

The `EskerModel` subclass with `esker_id`, `esker_source_url`, `esker_lineage_id` fields added. Class name: `Published<X>`. Generated by `cls.published()` and process-cached.

## REGISTRY

Process-global `dict[str, type[EskerPipeline]]` populated at import time by `@register` / `@pipeline`. Discovered via the `esker.pipelines` entry-point group.

## RunResult

The dataclass returned by `EskerPipeline.run()`. Carries identity (run_id, paths), timing, hashes, and a `manifest()` builder.

## SemVer

`<major>.<minor>.<patch>` only. No pre-releases, no build metadata. Pattern `^\d+\.\d+\.\d+$`.

## schema_version

A `ClassVar` on `EskerModel` subclasses giving the current version of _this record's shape_. Distinct from `pipeline_version` (which versions the transform code). Both go on the manifest. See [Manifests](https://esker.so/docs/protocol/manifests.md).

## source_id

`SOURCE.SOURCE_ID` — string identifying the origin. The decorator overrides this to `domain_id`. Default for `BulkJsonSource` / `BulkCsvSource` is `"bulk-json"` / `"bulk-csv"` if not overridden.

## source_digest

`sha256(<url1>\x00<iso1>\n<url2>\x00<iso2>\n…)` — fingerprint of the unique fetch set in a run. On the manifest. Stable across reruns iff URLs and timestamps are stable (almost never; `fetched_at = datetime.now()`).

## supersedes

`run_id` of the previous publish at the _same_ `schema_version`. Backlink for the run-chain within a single version. `None` for first publish or first publish at a new version.

## transform

User code. `transform(self|cls, raw: dict) -> EskerModel`. Pure function; the fixture harness compares its output byte-for-byte. No clocks, no RNG, no network.

## unbound

A bare name with no project or global binding. `bindings.resolve` raises `UnboundDatasetError`.

## See also

- [Naming](https://esker.so/docs/protocol/naming.md) — the four ID patterns
- [Public API](https://esker.so/docs/sdk/public-api.md) — what's exported
