Caching

Every disk path the SDK touches and the env var that overrides it.

The SDK touches a handful of paths on disk. All are derived from esker.config accessors, which re-read the environment on every call (no in-process caching).

Paths

function default env var what
cache_dir() ./cache ESKER_CACHE_DIR source-fetch cache (fetch_cached)
consumer_cache_dir() ~/.esker/cache ESKER_CONSUMER_CACHE_DIR downloaded-dataset cache (esker.get)
output_dir() ./output ESKER_OUTPUT_DIR default for pipeline.run() and CLI --output
hub_url() http://localhost:3001 ESKER_HUB_URL hub API base
web_url() http://localhost:3000 ESKER_WEB_URL hub web (used by login)
credentials_path() ~/.esker/credentials ESKER_CREDENTIALS_PATH login token, email, handle
global_bindings_path() ~/.esker/config.toml ESKER_GLOBAL_BINDINGS_PATH global [datasets] map
http_timeout() 60 (seconds) ESKER_HTTP_TIMEOUT every outbound HTTP timeout

Two cache dirs, two purposes

Easy to confuse. They are not the same.

cache_dir() — pipeline-author cache

Used by EskerSource.fetch_cached(id). Layout:

./cache/
└── <SOURCE_ID>/
    └── <safe_id>.json    JSON envelope: {"raw": ..., "source_url": ..., "fetched_at": "..."}

safe_id replaces / and : with _. Other unsafe filesystem chars (*, ?) aren't handled.

Bulk sources don't use this — they re-fetch the whole payload on every run. Per-id sources use it to avoid re-hitting their origin.

consumer_cache_dir() — dataset-consumer cache

Used by esker.get(ref). Layout:

~/.esker/cache/
└── <owner>/
    └── <domain_id>/
        └── <schema_version>/
            └── data.parquet

compute_content_hash runs on every esker.get call — even cached files are verified against the manifest. See Reading for the verification cost.

Output dir

CLI and library share one default: output_dir() (./output). Override via ESKER_OUTPUT_DIR or --output.

  • CLI: each --output defaults to output_dir(); the env var applies.
  • Library: EskerPipeline.run() falls back to output_dir() when no output_dir= is passed.

Mixing CLI invocations and direct library calls writes to the same place.

After esker run (or pipeline.run()):

<output_dir>/
├── <DOMAIN_ID>.parquet
└── <DOMAIN_ID>.lineage.json

Filenames are unconditional — they always equal the dotted DOMAIN_ID. Re-running overwrites in place.

Auth files

~/.esker/credentials: JSON with token, user_email, owner_handle, expires_at. Mode 0600 (best-effort).

~/.esker/config.toml: TOML with a [datasets] table mapping bare names to <owner>/<name> (no version pinning at this scope).

The ~/.esker/ directory is created on first write, not eagerly.

Hub URL defaults are localhost

Out of the box the SDK assumes you're running esker-hub locally:

ESKER_HUB_URL = http://localhost:3001
ESKER_WEB_URL = http://localhost:3000

For production, set ESKER_HUB_URL=https://hub.esker.so (or wherever) and ESKER_WEB_URL=https://esker.so in your environment. There's no .env file convention; just env vars.

HTTP timeout

ESKER_HTTP_TIMEOUT (default 60s) applies to every outbound HTTP call:

  • Source fetches in BulkJsonSource / BulkCsvSource.
  • Hub API calls (fetch_manifest, download_artifact_to, upload_*, search_datasets).
  • auth.fetch_whoami (used by login and whoami).
  • The CLI commands that hand-roll requests (config set-handle, transfer, visibility).

A wedged origin trips the timeout and surfaces as <TypeName>: <msg> (e.g. RemoteDisconnected: Remote end closed connection without response).

Parquet outputs

What lands on disk after esker run:

  • data.parquet — record rows with the three injected esker_* columns.
  • data.lineage.jsonLineageBundle, one batch per unique (source_url, fetched_at).

After esker push, the hub additionally receives schema.json, schema.arrow, schema.d.ts, and manifest.json. Local files are unchanged from run.

After esker pull <ref>:

  • <output_dir>/<DOMAIN_ID>.parquet

One file. No lineage.json on pullpull only fetches data.parquet. To get lineage, hit the artifact URL directly.

See also

  • Readingesker.get and _ensure_local
  • Sourcesfetch_cached mechanics
  • Auth — credentials file and login