Getting started

Install Esker, write a pipeline, test it, publish it, consume it. End to end in five minutes.

End to end: install, write a pipeline, test it, publish it, consume it.

This walkthrough builds a pipeline that publishes the SEC's company-ticker file. About five minutes.

Install

Esker requires Python 3.12+. The toolchain assumes uv for dependency management — pip works too.

uv init my-pipelines
cd my-pipelines
uv add esker

mkdir my-pipelines && cd my-pipelines
python -m venv .venv && source .venv/bin/activate
pip install esker

:::

The package wires two console scripts that point at the same CLI: esker and the shorter esk. Invoke either directly:

esker --help

Authoring works offline. Publishing needs an account.

esker login

Browser opens, you sign in, the CLI prints:

  signed in as you@example.com · publishing as you

Credentials land at ~/.esker/credentials (mode 0600). See Auth for the full flow and env-var overrides.

Write the pipeline

Create src/my_pipelines/sec_companies.py:

from typing import Annotated
from pydantic import Field
from esker import pipeline


@pipeline(
    "us.sec.companies@1.0.0",
    url="https://www.sec.gov/files/company_tickers.json",
    entity_type="corp",
    key="cik",
    source_url="https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK={cik}",
    cadence="daily",
)
class SecCompany:
    cik: Annotated[str, Field(pattern=r"^\d{10}$")]
    ticker: Annotated[str, Field(min_length=1, max_length=10)]
    title: str

    @classmethod
    def transform(cls, raw: dict) -> "SecCompany":
        return cls(
            cik=str(raw["cik_str"]).zfill(10),
            ticker=raw["ticker"],
            title=raw["title"],
        )

The decorator parses <domain>@<semver>, wraps the class as an EskerModel, synthesizes a BulkJsonSource from url=, builds an EskerPipeline, and registers it. You write the record shape and the per-row transform; everything else is generated.

Three injected fields — esker_id, esker_source_url, esker_lineage_id — land on each record at run time. You never set them yourself. See Records for the full mechanism.

Register the entry point

Esker discovers pipelines via importlib.metadata. Add to pyproject.toml:

[project]
dependencies = ["esker"]

[project.entry-points."esker.pipelines"]
sec_companies = "my_pipelines.sec_companies"

After editing entry points, reinstall the package so the metadata refreshes:

uv pip install -e . --reinstall-package my-pipelines

Confirm the pipeline shows up:

$ esker list
  us.sec.companies  1.0.0  daily  never run

Run it locally

$ esker run us.sec.companies
  us.sec.companies@1.0.0
  10,348 records · 2.1s · output/us.sec.companies.parquet

Two files land in ./output/:

output/
├── us.sec.companies.parquet
└── us.sec.companies.lineage.json

The parquet has your three author fields plus the three injected esker_* columns. The lineage JSON records what was fetched, when, and from where. See Lineage for the format.

Add a fixture

A fixture is a (raw_*.json, expected_*.json) pair. The harness diffs transform(raw).model_dump(mode="json") against expected.

src/my_pipelines/sec_companies_fixtures/raw_basic.json:

{
  "cik_str": 320193,
  "ticker": "AAPL",
  "title": "Apple Inc."
}

Run with --update to materialize the expected file:

$ esker test us.sec.companies --update
  us.sec.companies@1.0.0
  wrote expected_basic.json

Re-run to confirm:

$ esker test us.sec.companies
  us.sec.companies@1.0.0
  1 passed · 0.0s

esker push refuses to run if you have zero fixtures or any failing fixture. --force-untested bypasses the gate when you genuinely want to. See Fixtures for layouts and conventions.

Check schema compatibility

Before pushing, see what the hub thinks of the schema diff:

$ esker check us.sec.companies
  you/us.sec.companies
  1.0.0 · no prior version

First publish — nothing to compare against. After v1.0.0 is up, subsequent check runs report breaking vs additive changes and the minimum required SemVer bump. Push runs the same gate. Read Compatibility for the full classification rules.

Push

$ esker push us.sec.companies
  you/us.sec.companies@1.0.0
  10,348 records · 2.1s · output/us.sec.companies.parquet
  pushed you/us.sec.companies@1.0.0

Six artifacts land on the hub: data.parquet, schema.json, schema.arrow, schema.d.ts, lineage.json, manifest.json. From this moment your dataset is at esker.so/you/us.sec.companies.

Consume it

In another project (or the same one), bind the dataset:

$ esker add you/us.sec.companies
  us.sec.companies → you/us.sec.companies@1.0.0
  pyproject.toml · esker.lock

esker add writes a binding into pyproject.toml [tool.esker.datasets] and pins the resolved version in esker.lock. Now bare-name lookups work:

import esker

frame = esker.get("us.sec.companies")
print(frame.head())

esker.get resolves the bare name through bindings, fetches the manifest, downloads the parquet (cached at ~/.esker/cache/<owner>/<name>/<version>/), content-hash verifies, and hands you a polars DataFrame.

For one record by entity ID:

apple = esker.get_one("us.sec.companies", esker_id="esker:us:corp:0000320193")

For an equality filter:

techs = esker.search("us.sec.companies", ticker="AAPL")

See Reading for the full surface.

Where to go next

Pipelines — every decorator option.
Three-class form — when the decorator isn't enough.
Manifests — what the hub stores per release.
CLI overview — every command, every flag.

Edit on GitHub →View as Markdown