# Publishing

> Push a dataset to the hub. The check that runs before, the artifacts that go up, and what to do when something fails.

Publishing turns a local pipeline into a versioned dataset that other people can read. Two commands cover the workflow:

```sh
esker check <domain>    # see what would be published, no side effects
esker push <domain>     # actually publish
```

Both run the same compatibility gate. A green `check` means `push` will not be blocked by compatibility. The difference: `push` also runs the pipeline, ships six artifacts to the hub, and registers the new version.

## Your first publish

```sh
$ esker check us.sec.companies
  you/us.sec.companies
  1.0.0 · no prior version

$ esker push us.sec.companies
  you/us.sec.companies@1.0.0
  10,348 records · 2.1s · output/us.sec.companies.parquet
  pushed you/us.sec.companies@1.0.0
```

Three lines on success. The dataset is now at `esker.so/you/us.sec.companies` and addressable as `you/us.sec.companies@1.0.0`.

## What `push` actually does

In order:

1. **Authenticates.** Without credentials, exits immediately.
2. **Runs the fixture gate.** Zero fixtures or any failing fixture blocks the push.
3. **Runs the compatibility gate.** Diffs your local schema against the last published one.
4. **Runs the pipeline.** Same as `esker run` — writes parquet and lineage to `--output`.
5. **Uploads.** Six artifacts: parquet, JSON Schema, Arrow schema, TypeScript interface, lineage bundle, manifest.

If anything in steps 1–3 fails, nothing is uploaded and nothing on the hub changes. The pipeline only runs after the gates pass — a blocked schema doesn't waste a fetch.

## Subsequent publishes

After v1.0.0 is up, every push is a diff against the last published schema. The CLI reports what kind of bump is needed:

```
$ esker check us.sec.companies
  you/us.sec.companies
  1.0.0 → 1.1.0 (minor) · compatible
  field 'name_length' added
```

```
$ esker check us.sec.companies
  you/us.sec.companies
  1.0.0 → 1.0.1 (patch) · incompatible
  field 'cik': pattern '^\d{10}$' → '^\d{8}$'
  required bump: major
```

The first case is fine — bump `schema_version` in your code to `1.1.0` and push. The second is blocked — the engine says you need to bump to `2.0.0`.

For the full classification rules (what's breaking vs additive, how `Literal` interacts with the diff), see [Compatibility](https://esker.so/docs/protocol/compatibility.md).

## Major bumps

Major bumps are not diffed — a major version is treated as an effectively new dataset. The CLI requires you to acknowledge it explicitly:

```
$ esker push us.sec.companies
  major bump 1.0.0 → 2.0.0 requires --force-major

$ esker push us.sec.companies --force-major
  you/us.sec.companies@2.0.0
  ...
```

:::warn
Major bumps break consumers. Schedule them, communicate them ahead of time, and keep the old version available for at least one consumer release cycle. `--force-major` is the moment you confirm you mean it.
:::

## Re-publishing the same version

A push at the same `schema_version` is allowed only if the schema is byte-for-byte identical:

```
$ esker push us.sec.companies   # local v1.0.0 has a tweaked field type
  you/us.sec.companies@1.0.0 already published with a different schema; bump schema_version
```

A clean re-publish (same schema, different data) succeeds and links to the prior run via [`supersedes`](https://esker.so/docs/protocol/manifests.md#supersedes). Use this when:

- You've fixed a transform bug that affects record values but not the schema.
- The source has updated and you want to re-snapshot.

If you intended to change the schema, bump the version.

## Failure modes

| where                                     | what you'll see                                                        | what to do                                                                 |
| ----------------------------------------- | ---------------------------------------------------------------------- | -------------------------------------------------------------------------- |
| not signed in                             | `not signed in — run 'esker login'`                                    | Sign in, or set `ESKER_CREDENTIALS_PATH` if running in CI                  |
| zero fixtures                             | `0 fixtures` + hint                                                    | Add one fixture, or pass `--force-untested`                                |
| failed fixtures                           | `<P> passed · <F> failed` + hint                                       | Run `esker test <domain>` to see the diff, fix the transform               |
| compat blocked (incompatible patch/minor) | breaking changes listed + `required bump: <required>`                  | Bump `schema_version` to the required level, or revert the breaking change |
| compat blocked (major)                    | `major bump <a> → <b> requires --force-major`                          | Pass `--force-major` if you mean it                                        |
| same-version drift                        | `<ref> already published with a different schema; bump schema_version` | Bump the version                                                           |
| pipeline error                            | `<Type>: <msg>` + dim `→ frame:line`                                   | Fix the transform; the pointer points at your code                         |
| upload error                              | `hub <code>: <message>`                                                | Check the hub's status; retry                                              |
| transport error                           | `<TypeName>: <message>`                                                | Check connectivity; check `ESKER_HUB_URL`                                  |

Every failure is fail-closed. If `push` exits non-zero, nothing was uploaded.

## Bypasses

Two flags exist for the cases where the gates are wrong about your situation. Use them sparingly.

**`--force-untested`** skips the fixture gate entirely. For one-off datasets where adding fixtures would be ceremony.

**`--force-major`** allows a major bump without prompting.

There is no flag to bypass the compatibility check on patch/minor bumps — the right answer is to bump the version, not to override the rule.

## Ownership

By default, `push` publishes under the handle in your credentials. Override per-push with `--owner`:

```sh
esker push my.domain --owner statcan
```

You'll need permission to publish under that handle (org membership, etc.).

`produced_by` on the manifest always reflects your user email — `--owner` only changes the publishing namespace, not the identity of who ran the push.

## What lands on the hub

Six artifacts per push, each at a versioned URL:

| artifact        | what                            |
| --------------- | ------------------------------- |
| `data.parquet`  | the records                     |
| `schema.json`   | Pydantic JSON Schema            |
| `schema.arrow`  | Arrow IPC schema bytes          |
| `schema.d.ts`   | TypeScript interface            |
| `lineage.json`  | per-row provenance              |
| `manifest.json` | identity, integrity, timestamps |

```
esker.so/<owner>/<name>@<version>/data.parquet
esker.so/<owner>/<name>@<version>/schema.json
esker.so/<owner>/<name>@<version>/schema.arrow
esker.so/<owner>/<name>@<version>/schema.d.ts
esker.so/<owner>/<name>@<version>/lineage.json
esker.so/<owner>/<name>@<version>/manifest.json
```

The version-less paths (`/<owner>/<name>/data.parquet`) resolve to the latest published version.

For what each schema artifact contains and how to consume them, see [Arrow & TypeScript artifacts](https://esker.so/docs/guides/arrow-typescript.md). For the manifest's field shape, see [Manifests](https://esker.so/docs/protocol/manifests.md).

## Local outputs

`push` writes the same local files as `esker run` — `<DOMAIN_ID>.parquet` and `<DOMAIN_ID>.lineage.json` — into `--output` (default `./output`). It does not modify your `esker.lock` (the lockfile is consumer-side; publishing is a publisher action).

## Publishing from CI

The pattern matches any other CI command — provision credentials, then push. See [Authenticate from CI](https://esker.so/docs/cli/auth.md#authenticate-from-ci) for the credentials side.

A complete example:

```yaml
# .github/workflows/publish.yml
name: Publish dataset
on:
  schedule:
    - cron: "0 4 * * *" # daily at 04:00 UTC
  workflow_dispatch:

jobs:
  publish:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v3

      - name: Configure Esker
        run: |
          mkdir -p $RUNNER_TEMP/esker
          echo "$ESKER_CREDENTIALS_JSON" > $RUNNER_TEMP/esker/credentials
          chmod 600 $RUNNER_TEMP/esker/credentials
        env:
          ESKER_CREDENTIALS_JSON: ${{ secrets.ESKER_CREDENTIALS_JSON }}

      - name: Install
        run: uv sync

      - name: Test
        run: esker test

      - name: Push
        run: esker push us.sec.companies
        env:
          ESKER_CREDENTIALS_PATH: ${{ runner.temp }}/esker/credentials
          ESKER_HUB_URL: https://hub.esker.so
          ESKER_WEB_URL: https://esker.so
```

The `test` step is optional — `push` runs the fixture gate anyway — but failing fast in a separate step makes the CI log easier to read.

## See also

- [Compatibility](https://esker.so/docs/protocol/compatibility.md) — the diff classification the gate enforces
- [Manifests](https://esker.so/docs/protocol/manifests.md) — what gets recorded per release
- [Fixtures](https://esker.so/docs/sdk/fixtures.md) — the pre-publish gate
- [Auth](https://esker.so/docs/cli/auth.md) — credentials, CI patterns, hub URLs
