Fixtures

The test harness. Pure transform comparisons against canonical JSON.

A fixture is a (raw_<label>.json, expected_<label>.json) pair. The harness diffs pipeline.transform(raw).model_dump(mode="json") against expected. No network, no clocks, no RNG.

esker test runs fixtures locally. esker push runs the same harness as a pre-seal gate — push refuses to run if you have zero fixtures or any failing fixture.

Layout

The harness probes two layouts, in order:

Package form: <pipeline_package>/fixtures/. Used when the pipeline is a package (multiple files, one of them __init__.py). Convention for the three-class form.

my_pipelines/us_treasury_yields/
├── __init__.py
├── source.py
├── schema.py
├── pipeline.py
└── fixtures/
    ├── raw_basic.json
    └── expected_basic.json

Single-file form: <pipeline_file>_fixtures/ sibling dir. Used when the pipeline is a single .py file. Convention for the decorator form.

my_pipelines/
├── sec_companies.py
└── sec_companies_fixtures/
    ├── raw_basic.json
    ├── expected_basic.json
    ├── raw_short_cik.json
    └── expected_short_cik.json

The first existing layout wins. Missing fixtures dir → esker test prints no fixtures (dim) and treats the pipeline as untested-but-not-failed. esker push treats no-fixtures as a failure unless you pass --force-untested.

File naming

raw_<label>.json        the raw input (whatever your source yields as Fetched.raw)
expected_<label>.json   the expected transform(raw).model_dump(mode="json")

<label> is arbitrary. Use names that describe the case: basic, falcon1, short_cik, null_rate. The harness pairs files by the <label> portion.

Canonical JSON

Both reading and writing use canonical JSON: indent=2, sort_keys=True, ensure_ascii=False. So expected_*.json files are stably ordered and diff-friendly.

Example expected_basic.json for the SEC pipeline:

{
  "cik": "0000320193",
  "ticker": "AAPL",
  "title": "Apple Inc."
}

Note: no esker_id, no esker_source_url, no esker_lineage_id, no schema_version. Those are injected at pipeline.run() time, not by transform. Fixtures only contain author-domain fields.

What gets compared

actual = pipeline.transform(raw).model_dump(mode="json")

That's the draft model (no esker_* fields). mode="json" means date → ISO string, datetime → ISO string, UUID → string. The same form that lands in parquet.

transform must be a pure function. The harness compares bytes — any clock, RNG, or env-dependent value will fail.

Failure reasons

reason when
mismatch actual ≠ expected. Detail: unified diff (fromfile=expected, tofile=actual).
raised transform(raw) threw. Detail: <Type>: <msg>.
missing_expected raw_<label>.json exists, no matching expected_<label>.json, and not --update.
orphan_expected expected_<label>.json exists, no matching raw_<label>.json.

orphan_expected catches the common mistake of renaming a raw_*.json and forgetting the matching expected (or vice versa).

--update

no expected file       → write it
mismatch               → overwrite
existing matching      → leave alone
orphan_expected        → still flagged as failed (even with --update)

Destructive — overwrites without prompting. The standard workflow is "make a code change, run with --update, inspect the git diff."

Running

$ esker test us.sec.companies
  us.sec.companies@2.0.0
  2 passed · 0.0s

No domain → iterate every registered pipeline:

$ esker test
  global.spacex.rockets@1.0.0
  2 passed · 0.0s

  us.sec.companies@2.0.0
  2 passed · 0.0s

  us.treasury.yields@2.0.0
  2 passed · 0.0s

On failure:

  us.sec.companies@2.0.0
  1 passed · 1 failed · 0.0s

  mismatch: short_cik
  --- expected
  +++ actual
  @@ -1,4 +1,4 @@
   {
  -  "cik": "0000320193",
  +  "cik": "320193",
     "ticker": "AAPL",
     "title": "Apple Inc."
   }

Exit 0 if every fixture passed. Exit 1 if anything failed.

Programmatic API

from esker import run_fixtures, FixtureReport, FixtureFailure


report: FixtureReport = run_fixtures(MyPipeline, update=False)
report.ok            # bool
report.passed        # list[str] of labels
report.failed        # list[FixtureFailure]
report.wrote         # list[str] of labels (for --update mode)

The CLI uses this directly, as does esker push's pre-seal gate.

See also