Publishing
Push a dataset to the hub. The check that runs before, the artifacts that go up, and what to do when something fails.
Publishing turns a local pipeline into a versioned dataset that other people can read. Two commands cover the workflow:
esker check <domain> # see what would be published, no side effects
esker push <domain> # actually publish
Both run the same compatibility gate. A green check means push will not be blocked by compatibility. The difference: push also runs the pipeline, ships six artifacts to the hub, and registers the new version.
Your first publish
$ esker check us.sec.companies
you/us.sec.companies
1.0.0 · no prior version
$ esker push us.sec.companies
you/us.sec.companies@1.0.0
10,348 records · 2.1s · output/us.sec.companies.parquet
pushed you/us.sec.companies@1.0.0
Three lines on success. The dataset is now at esker.so/you/us.sec.companies and addressable as you/us.sec.companies@1.0.0.
What push actually does
In order:
- Authenticates. Without credentials, exits immediately.
- Runs the fixture gate. Zero fixtures or any failing fixture blocks the push.
- Runs the compatibility gate. Diffs your local schema against the last published one.
- Runs the pipeline. Same as
esker run— writes parquet and lineage to--output. - Uploads. Six artifacts: parquet, JSON Schema, Arrow schema, TypeScript interface, lineage bundle, manifest.
If anything in steps 1–3 fails, nothing is uploaded and nothing on the hub changes. The pipeline only runs after the gates pass — a blocked schema doesn't waste a fetch.
Subsequent publishes
After v1.0.0 is up, every push is a diff against the last published schema. The CLI reports what kind of bump is needed:
$ esker check us.sec.companies
you/us.sec.companies
1.0.0 → 1.1.0 (minor) · compatible
field 'name_length' added
$ esker check us.sec.companies
you/us.sec.companies
1.0.0 → 1.0.1 (patch) · incompatible
field 'cik': pattern '^\d{10}$' → '^\d{8}$'
required bump: major
The first case is fine — bump schema_version in your code to 1.1.0 and push. The second is blocked — the engine says you need to bump to 2.0.0.
For the full classification rules (what's breaking vs additive, how Literal interacts with the diff), see Compatibility.
Major bumps
Major bumps are not diffed — a major version is treated as an effectively new dataset. The CLI requires you to acknowledge it explicitly:
$ esker push us.sec.companies
major bump 1.0.0 → 2.0.0 requires --force-major
$ esker push us.sec.companies --force-major
you/us.sec.companies@2.0.0
...
Re-publishing the same version
A push at the same schema_version is allowed only if the schema is byte-for-byte identical:
$ esker push us.sec.companies # local v1.0.0 has a tweaked field type
you/us.sec.companies@1.0.0 already published with a different schema; bump schema_version
A clean re-publish (same schema, different data) succeeds and links to the prior run via supersedes. Use this when:
- You've fixed a transform bug that affects record values but not the schema.
- The source has updated and you want to re-snapshot.
If you intended to change the schema, bump the version.
Failure modes
| where | what you'll see | what to do |
|---|---|---|
| not signed in | not signed in — run 'esker login' |
Sign in, or set ESKER_CREDENTIALS_PATH if running in CI |
| zero fixtures | 0 fixtures + hint |
Add one fixture, or pass --force-untested |
| failed fixtures | <P> passed · <F> failed + hint |
Run esker test <domain> to see the diff, fix the transform |
| compat blocked (incompatible patch/minor) | breaking changes listed + required bump: <required> |
Bump schema_version to the required level, or revert the breaking change |
| compat blocked (major) | major bump <a> → <b> requires --force-major |
Pass --force-major if you mean it |
| same-version drift | <ref> already published with a different schema; bump schema_version |
Bump the version |
| pipeline error | <Type>: <msg> + dim → frame:line |
Fix the transform; the pointer points at your code |
| upload error | hub <code>: <message> |
Check the hub's status; retry |
| transport error | <TypeName>: <message> |
Check connectivity; check ESKER_HUB_URL |
Every failure is fail-closed. If push exits non-zero, nothing was uploaded.
Bypasses
Two flags exist for the cases where the gates are wrong about your situation. Use them sparingly.
--force-untested skips the fixture gate entirely. For one-off datasets where adding fixtures would be ceremony.
--force-major allows a major bump without prompting.
There is no flag to bypass the compatibility check on patch/minor bumps — the right answer is to bump the version, not to override the rule.
Ownership
By default, push publishes under the handle in your credentials. Override per-push with --owner:
esker push my.domain --owner statcan
You'll need permission to publish under that handle (org membership, etc.).
produced_by on the manifest always reflects your user email — --owner only changes the publishing namespace, not the identity of who ran the push.
What lands on the hub
Six artifacts per push, each at a versioned URL:
| artifact | what |
|---|---|
data.parquet |
the records |
schema.json |
Pydantic JSON Schema |
schema.arrow |
Arrow IPC schema bytes |
schema.d.ts |
TypeScript interface |
lineage.json |
per-row provenance |
manifest.json |
identity, integrity, timestamps |
esker.so/<owner>/<name>@<version>/data.parquet
esker.so/<owner>/<name>@<version>/schema.json
esker.so/<owner>/<name>@<version>/schema.arrow
esker.so/<owner>/<name>@<version>/schema.d.ts
esker.so/<owner>/<name>@<version>/lineage.json
esker.so/<owner>/<name>@<version>/manifest.json
The version-less paths (/<owner>/<name>/data.parquet) resolve to the latest published version.
For what each schema artifact contains and how to consume them, see Arrow & TypeScript artifacts. For the manifest's field shape, see Manifests.
Local outputs
push writes the same local files as esker run — <DOMAIN_ID>.parquet and <DOMAIN_ID>.lineage.json — into --output (default ./output). It does not modify your esker.lock (the lockfile is consumer-side; publishing is a publisher action).
Publishing from CI
The pattern matches any other CI command — provision credentials, then push. See Authenticate from CI for the credentials side.
A complete example:
# .github/workflows/publish.yml
name: Publish dataset
on:
schedule:
- cron: "0 4 * * *" # daily at 04:00 UTC
workflow_dispatch:
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v3
- name: Configure Esker
run: |
mkdir -p $RUNNER_TEMP/esker
echo "$ESKER_CREDENTIALS_JSON" > $RUNNER_TEMP/esker/credentials
chmod 600 $RUNNER_TEMP/esker/credentials
env:
ESKER_CREDENTIALS_JSON: ${{ secrets.ESKER_CREDENTIALS_JSON }}
- name: Install
run: uv sync
- name: Test
run: esker test
- name: Push
run: esker push us.sec.companies
env:
ESKER_CREDENTIALS_PATH: ${{ runner.temp }}/esker/credentials
ESKER_HUB_URL: https://hub.esker.so
ESKER_WEB_URL: https://esker.so
The test step is optional — push runs the fixture gate anyway — but failing fast in a separate step makes the CI log easier to read.
See also
- Compatibility — the diff classification the gate enforces
- Manifests — what gets recorded per release
- Fixtures — the pre-publish gate
- Auth — credentials, CI patterns, hub URLs