Arrow & TypeScript artifacts

How JSON Schema becomes schema.arrow and schema.d.ts on every push.

Every esker push uploads two derived schema artifacts alongside the canonical schema.json: an Arrow IPC schema (schema.arrow) and a TypeScript interface (schema.d.ts). Both are pure functions of the published model's JSON Schema — the same input always produces the same bytes.

If you want to consume these artifacts yourself, the rules below tell you what to expect.

Arrow

schema.arrow is the Arrow IPC-serialized pa.Schema for the published parquet. The same schema the parquet writer was opened with — so it matches the column types and nullability bit-for-bit.

Type mapping

JSON Schema Arrow
string string()
string + format: date-time timestamp('us', tz='UTC')
string + format: date date32()
integer int64()
number float64()
boolean bool_()
array<inner> list_(<inner>)
anyOf: [T, null] <T>, field marked nullable

Other string formats (uuid, uri) fall through to string(). Pydantic validates the format on construction; parquet stores plain text.

Anything not in the table — nested objects, unions other than T | null, etc. — raises ValueError("unmappable JSON Schema: {schema}"). The push fails before bytes leave your machine.

Nullability

A field is nullable iff it's not in the JSON Schema's required set:

pa.field(name, _arrow_type(prop), nullable=name not in required)

Same rule the parquet writer uses.

Use

import pyarrow as pa
import requests

r = requests.get("https://esker.so/archie/us.treasury.yields@2.0.0/schema.arrow")
schema = pa.ipc.read_schema(pa.BufferReader(r.content))

Hand schema to a ParquetWriter, an Arrow Flight reader, anything that wants a typed schema.

TypeScript

schema.d.ts is a single export interface <Name> { ... } declaration. Drop it into a TypeScript project and your records are typed.

Type mapping

JSON Schema TS
null null
string string
integer / number number
boolean boolean
array<inner> <inner>[]
anyOf: [...] <a> | <b> | ...
enum: [v1, v2] "v1" | "v2" (JSON-encoded literals)

Required vs optional

required drives the TS ? marker:

export interface UsTreasuryYields {
  quote_date: string;
  rate_1m?: number;
  rate_3m?: number;
  // ...
  esker_id: string;
  esker_source_url: string;
  esker_lineage_id: string;
}

Same rule as Arrow nullability.

Name stripping

The published model class is named Published<X> internally. The TS emitter strips the prefix so consumers see <X> — the domain-native name.

Use

curl -O https://esker.so/archie/us.treasury.yields@2.0.0/schema.d.ts

Drop into your project, import:

import type { UsTreasuryYields } from "./us.treasury.yields";

What lands per push

Six artifacts per esker push:

artifact source mime
data.parquet the records application/vnd.apache.parquet
schema.json Pydantic JSON Schema (sorted-key) application/json
schema.arrow to_arrow_bytes(model) application/vnd.apache.arrow.file
schema.d.ts to_typescript(model) text/plain; charset=utf-8
lineage.json LineageBundle application/json
manifest.json DatasetManifest (POSTed) application/json

URL pattern:

GET https://esker.so/<owner>/<name>@<version>/<artifact>

Versionless URLs (<owner>/<name>/data.parquet) resolve to the latest published version.

Programmatic emit

Both functions are part of the public API:

from esker import to_arrow_bytes, to_arrow_schema, to_typescript

arrow_schema = to_arrow_schema(MyPublishedModel)
arrow_bytes = to_arrow_bytes(MyPublishedModel)
ts_text = to_typescript(MyPublishedModel)

MyPublishedModel is MyModel.published() — the variant with the three injected fields. Calling these on the draft variant gives you a schema without esker_id / esker_source_url / esker_lineage_id.

See also