# Getting started

> Install Esker, write a pipeline, test it, publish it, consume it. End to end in five minutes.

End to end: install, write a pipeline, test it, publish it, consume it.

This walkthrough builds a pipeline that publishes the SEC's company-ticker file. About five minutes.

## Install

Esker requires Python 3.12+. The toolchain assumes [uv](https://github.com/astral-sh/uv) for dependency management — pip works too.

:::tabs
:::tab{label=uv}

```sh
uv init my-pipelines
cd my-pipelines
uv add esker
```

:::
:::tab{label=pip}

```sh
mkdir my-pipelines && cd my-pipelines
python -m venv .venv && source .venv/bin/activate
pip install esker
```

:::
:::

The package wires two console scripts that point at the same CLI: `esker` and the shorter `esk`. Invoke either directly:

```sh
esker --help
```

## Sign in

Authoring works offline. Publishing needs an account.

```sh
esker login
```

Browser opens, you sign in, the CLI prints:

```
  signed in as you@example.com · publishing as you
```

Credentials land at `~/.esker/credentials` (mode 0600). See [Auth](https://esker.so/docs/cli/auth.md) for the full flow and env-var overrides.

## Write the pipeline

Create `src/my_pipelines/sec_companies.py`:

```python
from typing import Annotated
from pydantic import Field
from esker import pipeline


@pipeline(
    "us.sec.companies@1.0.0",
    url="https://www.sec.gov/files/company_tickers.json",
    entity_type="corp",
    key="cik",
    source_url="https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK={cik}",
    cadence="daily",
)
class SecCompany:
    cik: Annotated[str, Field(pattern=r"^\d{10}$")]
    ticker: Annotated[str, Field(min_length=1, max_length=10)]
    title: str

    @classmethod
    def transform(cls, raw: dict) -> "SecCompany":
        return cls(
            cik=str(raw["cik_str"]).zfill(10),
            ticker=raw["ticker"],
            title=raw["title"],
        )
```

The decorator parses `<domain>@<semver>`, wraps the class as an `EskerModel`, synthesizes a `BulkJsonSource` from `url=`, builds an `EskerPipeline`, and registers it. You write the record shape and the per-row transform; everything else is generated.

Three injected fields — `esker_id`, `esker_source_url`, `esker_lineage_id` — land on each record at run time. You never set them yourself. See [Records](https://esker.so/docs/sdk/records.md) for the full mechanism.

## Register the entry point

Esker discovers pipelines via `importlib.metadata`. Add to `pyproject.toml`:

```toml
[project]
dependencies = ["esker"]

[project.entry-points."esker.pipelines"]
sec_companies = "my_pipelines.sec_companies"
```

After editing entry points, reinstall the package so the metadata refreshes:

```sh
uv pip install -e . --reinstall-package my-pipelines
```

Confirm the pipeline shows up:

```
$ esker list
  us.sec.companies  1.0.0  daily  never run
```

## Run it locally

```
$ esker run us.sec.companies
  us.sec.companies@1.0.0
  10,348 records · 2.1s · output/us.sec.companies.parquet
```

Two files land in `./output/`:

```
output/
├── us.sec.companies.parquet
└── us.sec.companies.lineage.json
```

The parquet has your three author fields plus the three injected `esker_*` columns. The lineage JSON records what was fetched, when, and from where. See [Lineage](https://esker.so/docs/protocol/lineage.md) for the format.

## Add a fixture

A fixture is a `(raw_*.json, expected_*.json)` pair. The harness diffs `transform(raw).model_dump(mode="json")` against `expected`.

`src/my_pipelines/sec_companies_fixtures/raw_basic.json`:

```json
{
  "cik_str": 320193,
  "ticker": "AAPL",
  "title": "Apple Inc."
}
```

Run with `--update` to materialize the expected file:

```
$ esker test us.sec.companies --update
  us.sec.companies@1.0.0
  wrote expected_basic.json
```

Re-run to confirm:

```
$ esker test us.sec.companies
  us.sec.companies@1.0.0
  1 passed · 0.0s
```

`esker push` refuses to run if you have zero fixtures or any failing fixture. `--force-untested` bypasses the gate when you genuinely want to. See [Fixtures](https://esker.so/docs/sdk/fixtures.md) for layouts and conventions.

## Check schema compatibility

Before pushing, see what the hub thinks of the schema diff:

```
$ esker check us.sec.companies
  you/us.sec.companies
  1.0.0 · no prior version
```

First publish — nothing to compare against. After v1.0.0 is up, subsequent `check` runs report breaking vs additive changes and the minimum required SemVer bump. Push runs the same gate. Read [Compatibility](https://esker.so/docs/protocol/compatibility.md) for the full classification rules.

## Push

```
$ esker push us.sec.companies
  you/us.sec.companies@1.0.0
  10,348 records · 2.1s · output/us.sec.companies.parquet
  pushed you/us.sec.companies@1.0.0
```

Six artifacts land on the hub: `data.parquet`, `schema.json`, `schema.arrow`, `schema.d.ts`, `lineage.json`, `manifest.json`. From this moment your dataset is at `esker.so/you/us.sec.companies`.

## Consume it

In another project (or the same one), bind the dataset:

```
$ esker add you/us.sec.companies
  us.sec.companies → you/us.sec.companies@1.0.0
  pyproject.toml · esker.lock
```

`esker add` writes a binding into `pyproject.toml [tool.esker.datasets]` and pins the resolved version in `esker.lock`. Now bare-name lookups work:

```python
import esker

frame = esker.get("us.sec.companies")
print(frame.head())
```

`esker.get` resolves the bare name through bindings, fetches the manifest, downloads the parquet (cached at `~/.esker/cache/<owner>/<name>/<version>/`), content-hash verifies, and hands you a polars `DataFrame`.

For one record by entity ID:

```python
apple = esker.get_one("us.sec.companies", esker_id="esker:us:corp:0000320193")
```

For an equality filter:

```python
techs = esker.search("us.sec.companies", ticker="AAPL")
```

See [Reading](https://esker.so/docs/sdk/reading.md) for the full surface.

## Where to go next

- [Pipelines](https://esker.so/docs/sdk/pipelines.md) — every decorator option.
- [Three-class form](https://esker.so/docs/sdk/three-class-form.md) — when the decorator isn't enough.
- [Manifests](https://esker.so/docs/protocol/manifests.md) — what the hub stores per release.
- [CLI overview](https://esker.so/docs/cli/overview.md) — every command, every flag.
