Glossary
Esker terms with grounded definitions.
Definitions favor "what it actually is" over conceptual.
binding
A bare name → <owner>/<name> mapping. Lives in pyproject.toml [tool.esker.datasets] (project-scoped) or ~/.esker/config.toml [datasets] (global). Project bindings can be version-pinned via esker.lock. See Bindings.
cadence
A free-form string ClassVar on a pipeline indicating how often it should run ("daily", "weekly"). Not behavior — Esker has no scheduler. Surfaced in esker list only.
CompatReport
Output of the schema diff. A classification: breaking, additive, required_bump. See Compatibility.
consumer
Anyone who reads a published dataset (via esker.get, esker pull, esker view, etc.). The opposite of a publisher.
DatasetManifest
Sidecar metadata for a published dataset. One per run. The hub's source of truth for "what does this dataset look like at version V". POSTed to /datasets/<owner>/<name> on push. See Manifests.
DatasetRef
<owner>/<name>[@<version>]. Frozen value object. Versioned for write paths; versionless = "latest" for reads. Bare-name shortcut lives in bindings.resolve, not in the parser. Internal — users pass strings.
DOMAIN_ID
The bare name of a dataset. Lowercase a–z0–9 dot-separated, ≥ 2 segments. First segment is the jurisdiction. Examples: us.sec.companies, ca.corporations.registry. Pattern: ^[a-z0-9]+(\.[a-z0-9]+)+$.
EskerEntity
The "what is this thing" record (entity-resolver concept). Carries esker_id, entity_type, jurisdiction, name, discovered_at. Currently not consumed by any path; deferred to a future release.
esker_id
The cross-dataset join key on a single conceptual entity. Format: esker:<jurisdiction>:<entity_type>:<native_id>. Two records about the same corporation share one esker_id.
Built at run time by EskerPipeline.run().
esker_lineage_id
Per-record reference to a lineage batch (a unique (source_url, fetched_at) pair). UUID, stored as a column in parquet, joined against <domain>.lineage.json's batches[*].lineage_id.
EskerModel
Pydantic base for every domain record. frozen=True, extra="forbid". Subclass declares DOMAIN_ID and schema_version as ClassVars. Variants: draft (no esker_*) and published (with the three injected fields, via .published()). See Records.
EskerPipeline
ABC binding a SOURCE to a SCHEMA via a transform. Implementing class declares 6 ClassVars. .run(output_dir) produces a RunResult and writes parquet + lineage. See Three-class form.
EskerSource
ABC for raw data origins. Yields Fetched envelopes via fetch_all() (and optionally per-id fetch(id)). Provides per-id caching via fetch_cached. Subclasses declare SOURCE_ID. See Sources.
entity_type
Lowercase-letters-only token identifying the kind of entity a record describes (corp, rocket, curve). Embedded in esker_id.
Fetched
A (raw, source_url, fetched_at) envelope yielded by sources. The thing pipelines transform.
fixture
A (raw_<label>.json, expected_<label>.json) pair under a fixture dir. The harness diffs pipeline.transform(raw).model_dump(mode="json") against expected. See Fixtures.
hub
The commercial half of Esker — esker.so. The SDK talks to it over HTTP via esker.client.hub. Default localhost:3001.
jurisdiction
First dot-segment of DOMAIN_ID. Coded as 2+ lowercase letters. Used to construct esker_id. Conventional values: us, ca, global, un, eu.
key (or _KEY)
The field on the model holding the native identifier (e.g. "cik" for SEC, "rocket_id" for SpaceX). Used by pipeline.run() to extract native_id and build esker_id.
LineageBatch / LineageBundle
LineageBatch: a unique (source_url, fetched_at, lineage_id) triple. LineageBundle: the manifest_version + run_id + list of batches written as <domain>.lineage.json. See Lineage.
owner / OwnerHandle
A publishing namespace. GitHub-style handle (1–39 chars, lowercase a–z0–9 with non-doubled hyphens). Single global namespace shared by users and orgs. Reserved words blocked. See Handles.
pipeline (decorator)
@pipeline("<domain>@<semver>", entity_type=, key=, url=|source=) — the shorthand that synthesizes the three classes (Source, Schema, Pipeline) from a plain class with annotations and a transform classmethod. The 80% authoring path. See Pipelines.
published variant
The EskerModel subclass with esker_id, esker_source_url, esker_lineage_id fields added. Class name: Published<X>. Generated by cls.published() and process-cached.
REGISTRY
Process-global dict[str, type[EskerPipeline]] populated at import time by @register / @pipeline. Discovered via the esker.pipelines entry-point group.
RunResult
The dataclass returned by EskerPipeline.run(). Carries identity (run_id, paths), timing, hashes, and a manifest() builder.
SemVer
<major>.<minor>.<patch> only. No pre-releases, no build metadata. Pattern ^\d+\.\d+\.\d+$.
schema_version
A ClassVar on EskerModel subclasses giving the current version of this record's shape. Distinct from pipeline_version (which versions the transform code). Both go on the manifest. See Manifests.
source_id
SOURCE.SOURCE_ID — string identifying the origin. The decorator overrides this to domain_id. Default for BulkJsonSource / BulkCsvSource is "bulk-json" / "bulk-csv" if not overridden.
source_digest
sha256(<url1>\x00<iso1>\n<url2>\x00<iso2>\n…) — fingerprint of the unique fetch set in a run. On the manifest. Stable across reruns iff URLs and timestamps are stable (almost never; fetched_at = datetime.now()).
supersedes
run_id of the previous publish at the same schema_version. Backlink for the run-chain within a single version. None for first publish or first publish at a new version.
transform
User code. transform(self|cls, raw: dict) -> EskerModel. Pure function; the fixture harness compares its output byte-for-byte. No clocks, no RNG, no network.
unbound
A bare name with no project or global binding. bindings.resolve raises UnboundDatasetError.
See also
- Naming — the four ID patterns
- Public API — what's exported