Overview

What Esker is, what it does, and the three nouns to hold in your head.

Esker is trying to become the default publish target for normalized public data. GitHub for code; HuggingFace for models; Esker for data.

This is the documentation for the protocol, the Python SDK, the CLI, and the hub. The SDK is what pip install esker gives you and is everything you need to author, publish, and consume datasets from a script, a notebook, a Lambda, or a long-running process. The hub at esker.so is where published datasets live.

What it does

Three jobs, in the order you'll meet them.

Authoring. You write a Python class describing one record of your dataset and a transform function turning one raw row from the source into one of those records. The framework handles schema emission, parquet writing, lineage capture, and manifest construction.

Publishing. esker push runs the pipeline, gates the schema diff against the last published version, and uploads six artifacts to the hub: parquet, JSON Schema, Arrow IPC schema, TypeScript interface, lineage bundle, manifest. From that moment your dataset is addressable as <owner>/<name>@<version>.

Consuming. Other code calls esker.get("you/your-dataset") and receives a polars DataFrame. The cache is content-hash verified on every read. A lockfile pins exact versions so the read is reproducible.

The model

Three nouns to hold in your head.

noun	what	example
dataset	a published thing on the hub	`archie/us.sec.companies@2.0.0`
schema	the record shape contract	`{cik: str, ticker: str, ...}`
entity	the real-world thing a record is about	a corporation, a rocket, a yield curve point

A dataset has rows; each row describes (or is) an entity; the rows conform to a schema. Schemas evolve through SemVer; entities have stable IDs across datasets; datasets get re-published with new manifests over time.

How the pieces fit

your project
├── pyproject.toml              entry-point group `esker.pipelines`
├── esker.lock                  pinned versions of consumed datasets
└── src/your_pipelines/
    └── your_dataset.py         @pipeline + transform

         │
         │  esker push your.domain
         ▼
    esker.so/<owner>/<name>     data.parquet, schema.json, schema.arrow,
                                schema.d.ts, lineage.json, manifest.json

         │
         │  esker.get("them/their-dataset")
         ▼
    ~/.esker/cache/<owner>/<name>/<version>/data.parquet
                                            │
                                            ▼
                                  polars.DataFrame

The SDK has no first-party knowledge of any specific data source. Pipelines live in your project — pip install esker gives you the abstractions, not the data.

What Esker is not

Not an orchestrator. No DAGs, no triggers, no schedulers. cadence is metadata, not behavior. Use cron, Airflow, Dagster, or Prefect to schedule Esker runs.
Not a query engine. esker.get returns a polars DataFrame. Bring your own analytics.
Not a transformation framework. transform is a per-record pure function, not a SQL model. Esker is upstream of dbt.
Not a data catalog. No tags, no business glossary, no SLAs.
Not a real-time stream. Bulk fetches, batch parquet, occasional runs.

Design principles

Minimalism is a product-level decision. The default is no decoration. Color is signal: red for errors, dim for secondary info. No emoji, no icons, no exclamation, no spinners.
Types carry invariants, not convention. EskerModel is frozen=True, extra="forbid". Records are values, not objects. Misuse fails loudly.
The CLI voice is git push, not dbt run. Two-line success. No progress bars. No banners.
Bind once, then live in bare-name space. Owner choice is one explicit moment per dataset. Code reads bare names; bindings disambiguate.
No hidden behavior. No background jobs, no implicit retries, no upstream caching for bulk sources. What you see in the script is what runs.

Where to go next

Getting started for the end-to-end walkthrough.
Pipelines if you have a dataset in mind.
Reading if you only want to consume what others publish.
CLI overview for every command and flag.

Edit on GitHub →View as Markdown