Getting started
Install Esker, write a pipeline, test it, publish it, consume it. End to end in five minutes.
End to end: install, write a pipeline, test it, publish it, consume it.
This walkthrough builds a pipeline that publishes the SEC's company-ticker file. About five minutes.
Install
Esker requires Python 3.12+. The toolchain assumes uv for dependency management — pip works too.
uv init my-pipelines
cd my-pipelines
uv add eskermkdir my-pipelines && cd my-pipelines
python -m venv .venv && source .venv/bin/activate
pip install esker:::
The package wires two console scripts that point at the same CLI: esker and the shorter esk. Invoke either directly:
esker --help
Sign in
Authoring works offline. Publishing needs an account.
esker login
Browser opens, you sign in, the CLI prints:
signed in as you@example.com · publishing as you
Credentials land at ~/.esker/credentials (mode 0600). See Auth for the full flow and env-var overrides.
Write the pipeline
Create src/my_pipelines/sec_companies.py:
from typing import Annotated
from pydantic import Field
from esker import pipeline
@pipeline(
"us.sec.companies@1.0.0",
url="https://www.sec.gov/files/company_tickers.json",
entity_type="corp",
key="cik",
source_url="https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK={cik}",
cadence="daily",
)
class SecCompany:
cik: Annotated[str, Field(pattern=r"^\d{10}$")]
ticker: Annotated[str, Field(min_length=1, max_length=10)]
title: str
@classmethod
def transform(cls, raw: dict) -> "SecCompany":
return cls(
cik=str(raw["cik_str"]).zfill(10),
ticker=raw["ticker"],
title=raw["title"],
)
The decorator parses <domain>@<semver>, wraps the class as an EskerModel, synthesizes a BulkJsonSource from url=, builds an EskerPipeline, and registers it. You write the record shape and the per-row transform; everything else is generated.
Three injected fields — esker_id, esker_source_url, esker_lineage_id — land on each record at run time. You never set them yourself. See Records for the full mechanism.
Register the entry point
Esker discovers pipelines via importlib.metadata. Add to pyproject.toml:
[project]
dependencies = ["esker"]
[project.entry-points."esker.pipelines"]
sec_companies = "my_pipelines.sec_companies"
After editing entry points, reinstall the package so the metadata refreshes:
uv pip install -e . --reinstall-package my-pipelines
Confirm the pipeline shows up:
$ esker list
us.sec.companies 1.0.0 daily never run
Run it locally
$ esker run us.sec.companies
us.sec.companies@1.0.0
10,348 records · 2.1s · output/us.sec.companies.parquet
Two files land in ./output/:
output/
├── us.sec.companies.parquet
└── us.sec.companies.lineage.json
The parquet has your three author fields plus the three injected esker_* columns. The lineage JSON records what was fetched, when, and from where. See Lineage for the format.
Add a fixture
A fixture is a (raw_*.json, expected_*.json) pair. The harness diffs transform(raw).model_dump(mode="json") against expected.
src/my_pipelines/sec_companies_fixtures/raw_basic.json:
{
"cik_str": 320193,
"ticker": "AAPL",
"title": "Apple Inc."
}
Run with --update to materialize the expected file:
$ esker test us.sec.companies --update
us.sec.companies@1.0.0
wrote expected_basic.json
Re-run to confirm:
$ esker test us.sec.companies
us.sec.companies@1.0.0
1 passed · 0.0s
esker push refuses to run if you have zero fixtures or any failing fixture. --force-untested bypasses the gate when you genuinely want to. See Fixtures for layouts and conventions.
Check schema compatibility
Before pushing, see what the hub thinks of the schema diff:
$ esker check us.sec.companies
you/us.sec.companies
1.0.0 · no prior version
First publish — nothing to compare against. After v1.0.0 is up, subsequent check runs report breaking vs additive changes and the minimum required SemVer bump. Push runs the same gate. Read Compatibility for the full classification rules.
Push
$ esker push us.sec.companies
you/us.sec.companies@1.0.0
10,348 records · 2.1s · output/us.sec.companies.parquet
pushed you/us.sec.companies@1.0.0
Six artifacts land on the hub: data.parquet, schema.json, schema.arrow, schema.d.ts, lineage.json, manifest.json. From this moment your dataset is at esker.so/you/us.sec.companies.
Consume it
In another project (or the same one), bind the dataset:
$ esker add you/us.sec.companies
us.sec.companies → you/us.sec.companies@1.0.0
pyproject.toml · esker.lock
esker add writes a binding into pyproject.toml [tool.esker.datasets] and pins the resolved version in esker.lock. Now bare-name lookups work:
import esker
frame = esker.get("us.sec.companies")
print(frame.head())
esker.get resolves the bare name through bindings, fetches the manifest, downloads the parquet (cached at ~/.esker/cache/<owner>/<name>/<version>/), content-hash verifies, and hands you a polars DataFrame.
For one record by entity ID:
apple = esker.get_one("us.sec.companies", esker_id="esker:us:corp:0000320193")
For an equality filter:
techs = esker.search("us.sec.companies", ticker="AAPL")
See Reading for the full surface.
Where to go next
- Pipelines — every decorator option.
- Three-class form — when the decorator isn't enough.
- Manifests — what the hub stores per release.
- CLI overview — every command, every flag.