Publishing

Push a dataset to the hub. The check that runs before, the artifacts that go up, and what to do when something fails.

Publishing turns a local pipeline into a versioned dataset that other people can read. Two commands cover the workflow:

esker check <domain>    # see what would be published, no side effects
esker push <domain>     # actually publish

Both run the same compatibility gate. A green check means push will not be blocked by compatibility. The difference: push also runs the pipeline, ships six artifacts to the hub, and registers the new version.

Your first publish

$ esker check us.sec.companies
  you/us.sec.companies
  1.0.0 · no prior version

$ esker push us.sec.companies
  you/us.sec.companies@1.0.0
  10,348 records · 2.1s · output/us.sec.companies.parquet
  pushed you/us.sec.companies@1.0.0

Three lines on success. The dataset is now at esker.so/you/us.sec.companies and addressable as you/us.sec.companies@1.0.0.

What push actually does

In order:

  1. Authenticates. Without credentials, exits immediately.
  2. Runs the fixture gate. Zero fixtures or any failing fixture blocks the push.
  3. Runs the compatibility gate. Diffs your local schema against the last published one.
  4. Runs the pipeline. Same as esker run — writes parquet and lineage to --output.
  5. Uploads. Six artifacts: parquet, JSON Schema, Arrow schema, TypeScript interface, lineage bundle, manifest.

If anything in steps 1–3 fails, nothing is uploaded and nothing on the hub changes. The pipeline only runs after the gates pass — a blocked schema doesn't waste a fetch.

Subsequent publishes

After v1.0.0 is up, every push is a diff against the last published schema. The CLI reports what kind of bump is needed:

$ esker check us.sec.companies
  you/us.sec.companies
  1.0.0 → 1.1.0 (minor) · compatible
  field 'name_length' added
$ esker check us.sec.companies
  you/us.sec.companies
  1.0.0 → 1.0.1 (patch) · incompatible
  field 'cik': pattern '^\d{10}$' → '^\d{8}$'
  required bump: major

The first case is fine — bump schema_version in your code to 1.1.0 and push. The second is blocked — the engine says you need to bump to 2.0.0.

For the full classification rules (what's breaking vs additive, how Literal interacts with the diff), see Compatibility.

Major bumps

Major bumps are not diffed — a major version is treated as an effectively new dataset. The CLI requires you to acknowledge it explicitly:

$ esker push us.sec.companies
  major bump 1.0.0 → 2.0.0 requires --force-major

$ esker push us.sec.companies --force-major
  you/us.sec.companies@2.0.0
  ...

Re-publishing the same version

A push at the same schema_version is allowed only if the schema is byte-for-byte identical:

$ esker push us.sec.companies   # local v1.0.0 has a tweaked field type
  you/us.sec.companies@1.0.0 already published with a different schema; bump schema_version

A clean re-publish (same schema, different data) succeeds and links to the prior run via supersedes. Use this when:

  • You've fixed a transform bug that affects record values but not the schema.
  • The source has updated and you want to re-snapshot.

If you intended to change the schema, bump the version.

Failure modes

where what you'll see what to do
not signed in not signed in — run 'esker login' Sign in, or set ESKER_CREDENTIALS_PATH if running in CI
zero fixtures 0 fixtures + hint Add one fixture, or pass --force-untested
failed fixtures <P> passed · <F> failed + hint Run esker test <domain> to see the diff, fix the transform
compat blocked (incompatible patch/minor) breaking changes listed + required bump: <required> Bump schema_version to the required level, or revert the breaking change
compat blocked (major) major bump <a> → <b> requires --force-major Pass --force-major if you mean it
same-version drift <ref> already published with a different schema; bump schema_version Bump the version
pipeline error <Type>: <msg> + dim → frame:line Fix the transform; the pointer points at your code
upload error hub <code>: <message> Check the hub's status; retry
transport error <TypeName>: <message> Check connectivity; check ESKER_HUB_URL

Every failure is fail-closed. If push exits non-zero, nothing was uploaded.

Bypasses

Two flags exist for the cases where the gates are wrong about your situation. Use them sparingly.

--force-untested skips the fixture gate entirely. For one-off datasets where adding fixtures would be ceremony.

--force-major allows a major bump without prompting.

There is no flag to bypass the compatibility check on patch/minor bumps — the right answer is to bump the version, not to override the rule.

Ownership

By default, push publishes under the handle in your credentials. Override per-push with --owner:

esker push my.domain --owner statcan

You'll need permission to publish under that handle (org membership, etc.).

produced_by on the manifest always reflects your user email — --owner only changes the publishing namespace, not the identity of who ran the push.

What lands on the hub

Six artifacts per push, each at a versioned URL:

artifact what
data.parquet the records
schema.json Pydantic JSON Schema
schema.arrow Arrow IPC schema bytes
schema.d.ts TypeScript interface
lineage.json per-row provenance
manifest.json identity, integrity, timestamps
esker.so/<owner>/<name>@<version>/data.parquet
esker.so/<owner>/<name>@<version>/schema.json
esker.so/<owner>/<name>@<version>/schema.arrow
esker.so/<owner>/<name>@<version>/schema.d.ts
esker.so/<owner>/<name>@<version>/lineage.json
esker.so/<owner>/<name>@<version>/manifest.json

The version-less paths (/<owner>/<name>/data.parquet) resolve to the latest published version.

For what each schema artifact contains and how to consume them, see Arrow & TypeScript artifacts. For the manifest's field shape, see Manifests.

Local outputs

push writes the same local files as esker run<DOMAIN_ID>.parquet and <DOMAIN_ID>.lineage.json — into --output (default ./output). It does not modify your esker.lock (the lockfile is consumer-side; publishing is a publisher action).

Publishing from CI

The pattern matches any other CI command — provision credentials, then push. See Authenticate from CI for the credentials side.

A complete example:

# .github/workflows/publish.yml
name: Publish dataset
on:
  schedule:
    - cron: "0 4 * * *" # daily at 04:00 UTC
  workflow_dispatch:

jobs:
  publish:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v3

      - name: Configure Esker
        run: |
          mkdir -p $RUNNER_TEMP/esker
          echo "$ESKER_CREDENTIALS_JSON" > $RUNNER_TEMP/esker/credentials
          chmod 600 $RUNNER_TEMP/esker/credentials
        env:
          ESKER_CREDENTIALS_JSON: ${{ secrets.ESKER_CREDENTIALS_JSON }}

      - name: Install
        run: uv sync

      - name: Test
        run: esker test

      - name: Push
        run: esker push us.sec.companies
        env:
          ESKER_CREDENTIALS_PATH: ${{ runner.temp }}/esker/credentials
          ESKER_HUB_URL: https://hub.esker.so
          ESKER_WEB_URL: https://esker.so

The test step is optional — push runs the fixture gate anyway — but failing fast in a separate step makes the CI log easier to read.

See also

  • Compatibility — the diff classification the gate enforces
  • Manifests — what gets recorded per release
  • Fixtures — the pre-publish gate
  • Auth — credentials, CI patterns, hub URLs