Errors and footguns

Every error path, verbatim, plus the non-obvious behaviors worth knowing in advance.

This page collects the exact error messages the CLI emits and the behaviors that surprise people. Read top-to-bottom once; come back when something goes red.

Error format

Errors are two lines. The first is red, the type and message. The second is dim, a → file:line pointer at your code.

  ValueError: esker_id jurisdiction 'us' does not match DOMAIN_ID jurisdiction 'ca'
  → my_pipelines/sec_companies.py:34

-v / --verbose adds the full traceback after a blank line.

The pointer is the user frame — picked from the deepest stack frame containing your pipeline file rather than the SDK internals.

Pipeline lookup

$ esker run nonexistent.foo
  No pipeline registered for domain 'nonexistent.foo'

Same shape from test, check, push, schema. Caused by KeyError from the registry.

Bare-name resolution

$ esker view us.sec.companies   # no binding
  no binding for 'us.sec.companies' · run 'esker add <owner>/us.sec.companies'
or use a full ref

Raised by bindings.resolve as UnboundDatasetError. Add a binding or use a full ref.

Schema with no binding (special case)

$ esker schema us.sec.companies   # registered locally, no binding
  us.sec.companies@1.0.0
  via local

  <field-table>

esker schema skips the bindings lookup entirely if the bare name matches a registered pipeline — header drops the owner prefix because there isn't a published owner yet.

--remote forces the bindings lookup; an unbound name then fails as elsewhere.

Invalid refs

$ esker view Bad/Ref
  ValueError: invalid owner 'Bad'

DatasetRef.__post_init__ validates owner → name → version in that order. Uppercase fails the regex.

Auth gate

$ esker push us.sec.companies   # no creds
  not signed in — run 'esker login'

Raised by auth.auth_header() as CredentialsError (subclass of HubError, but renders without the hub 0: prefix).

esker check requires either credentials or --owner:

$ esker check us.sec.companies   # neither
  not signed in — run 'esker login'

Whoami without credentials

$ esker whoami   # no creds
  not signed in
  → run 'esker login'

Two-line: red + dim hint.

Hub down (network)

$ esker manifest archie/us.sec.companies
  RemoteDisconnected: Remote end closed connection without response

$ esker check us.sec.companies
  ConnectionRefusedError: [Errno 61] Connection refused

Transport failures wrap at the hub.py boundary as HubUnreachableError(HubError). Every CLI's except HubError catches both 4xx/5xx and unreachable-hub failures uniformly.

Owner handle validation

$ esker config set-owner BadHandle
  invalid handle 'BadHandle'

$ esker config set-owner api
  invalid handle 'api'

Same red message for length, regex, and reserved-word failures. The message doesn't say why it's invalid — see Handles.

Visibility validation

$ esker visibility archie/foo unknown
  setting must be 'public' or 'private'

$ esker visibility archie/foo private
  private not yet supported · landing in phase 2

Phase 1 only accepts public. private exits 1 without contacting the server.

EskerModel construction

>>> SecCompany(cik="0000320193", esker_id="esker:us:corp:0000320193")
ValidationError: 1 validation error for SecCompany
esker_id
  Extra inputs are not permitted [type=extra_forbidden, ...]

extra="forbid" blocks setting esker_* on the draft. The injection happens inside EskerPipeline.run(); user code can't bypass it.

Subclass enforcement

>>> class Bad(EskerModel):
...     x: int = 0
TypeError: Bad must declare a DOMAIN_ID class variable

>>> class Bad2(EskerModel):
...     DOMAIN_ID: ClassVar[str] = 'BAD-ID'
...     schema_version: ClassVar[str] = '1.0.0'
TypeError: Bad2.DOMAIN_ID 'BAD-ID' must match
^[a-z0-9]+(\.[a-z0-9]+)+$ (lowercase a-z0-9, dot-separated)

>>> NoVersion.declared_version()   # no schema_version ClassVar
ValueError: NoVersion must declare a schema_version class variable

Decorator validation

All TypeError. All fire at module-import (decoration) time:

TypeError: @pipeline ref must be '<domain>@<semver>', got 'badref'
TypeError: @pipeline entity_type must match /^[a-z]+$/, got 'Corp1'
TypeError: @pipeline requires exactly one of `url=` or `source=`
TypeError: @pipeline key='nonexistent' is not a field on E
TypeError: @pipeline class NoTransform must define `transform(cls, raw) -> cls` as a classmethod
TypeError: @pipeline decorates plain classes; use the explicit three-class form when subclassing EskerModel directly.

These propagate up from the entry-point load. The CLI command exits 1.

Source URL template misuse

KeyError: source_url template 'https://example.com/{nonexistent_field}'
references field 'nonexistent_field' which is not on X
(available: ['name', 'wid'])

The pipeline wraps str.format's KeyError with the template, the missing key, and the available draft fields.

Compat (push-time)

  field 'cik': pattern '^\d{10}$' → '^\d{8}$'
  required bump: major

CompatError rendering: each breaking change on its own line, then the message in red.

  major bump 1.0.0 → 2.0.0 requires --force-major

Pass --force-major if you mean it.

  archie/us.sec.companies@1.0.0 already published with a different schema; bump schema_version

Same-version re-publish with any schema change (breaking or additive).

Fixture failure

$ esker test
  global.spacex.rockets@1.0.0
  0 passed · 2 failed · 0.0s

  mismatch: falcon1
  --- expected
  +++ actual
  @@ ...

See Fixtures for the four failure reasons.


Footguns

Non-obvious behaviors. Worth knowing in advance.

schema_version: SemVer = "1.0.0" silently breaks the model

The most-flagged pitfall. Writing it as a Pydantic field instead of a ClassVar turns schema_version into a per-record column, breaks declared_version(), and pollutes the JSON Schema. Always use:

schema_version: ClassVar[str] = "2.0.0"

The decorator path bypasses this trap entirely.

Two consecutive runs produce different content_hash

Because Fetched.fetched_at and per-batch lineage_id change every run, the parquet bytes change, and content_hash changes. The compat engine doesn't care — it diffs JSON Schemas, not parquet bytes — but a user expecting identical bytes for identical inputs will be surprised.

The supersedes chain is the right way to think about re-publishes: each push is a new run with a new content hash, linked back to the previous run at the same version.

esker sync reports drift but doesn't fix it

esker sync prints <name> · hash drift · run 'esker upgrade <name>' when the lockfile's content_hash differs from the hub's latest. It doesn't auto-upgrade. Run upgrade per drifted name.

That's intentional — drift is a security signal, not a routine event — but worth knowing.

esker config set-owner doesn't write anything

It validates the handle and prints a paste-able snippet. If you're expecting state mutation, you'll be surprised.

esker config set-handle makes the local creds stale

It only PATCHes the server. The local ~/.esker/credentials owner_handle field is unchanged. Subsequent pushes use the cached old handle until you re-login. The success message says so but it's easy to miss.

BulkJsonSource re-fetches on every run

No bulk-cache primitive. Per-id sources can use fetch_cached; bulk sources hit the network every time. Big payloads get re-downloaded for every test run.

BulkJsonSource.SOURCE_ID = "bulk-json" (default)

If you subclass BulkJsonSource directly without setting SOURCE_ID, the manifest will record source_id="bulk-json" — meaningless. The decorator path overrides this to domain_id, so you only hit it if you go three-class with BulkJsonSource as base. Always set SOURCE_ID explicitly.

esker list reads ./output/<domain>.parquet for "last run"

The "last run" timestamp is read from ./output/<domain>.parquet's mtime — fixed path, ignores --output. A user who runs with -o data/ will always see never run.

JWT signature is not verified by the SDK

The SDK reads the JWT's exp claim only. Server-side validates on every request. So a tampered token will pass local checks until it hits an authenticated endpoint.

Pattern-constrained strings lose their pattern in Arrow

Annotated[str, Field(pattern=r"^\d{10}$")] (CIK) renders as Arrow string with no constraint metadata. Pydantic validates on construction; parquet is inert.

Literal of one value confuses the compat checker

Pydantic emits Literal["x"] as {"const": "x", "type": "string"} (no enum keyword). But Literal["x", "y"] is {"enum": ["x", "y"], "type": "string"}.

A transition Literal["x", "y"]Literal["x"] shows as enum keyword added or removed (breaking) rather than enum values removed. The classification is right, the message could be clearer.

See also