Arrow & TypeScript artifacts
How JSON Schema becomes schema.arrow and schema.d.ts on every push.
Every esker push uploads two derived schema artifacts alongside the canonical schema.json: an Arrow IPC schema (schema.arrow) and a TypeScript interface (schema.d.ts). Both are pure functions of the published model's JSON Schema — the same input always produces the same bytes.
If you want to consume these artifacts yourself, the rules below tell you what to expect.
Arrow
schema.arrow is the Arrow IPC-serialized pa.Schema for the published parquet. The same schema the parquet writer was opened with — so it matches the column types and nullability bit-for-bit.
Type mapping
| JSON Schema | Arrow |
|---|---|
string |
string() |
string + format: date-time |
timestamp('us', tz='UTC') |
string + format: date |
date32() |
integer |
int64() |
number |
float64() |
boolean |
bool_() |
array<inner> |
list_(<inner>) |
anyOf: [T, null] |
<T>, field marked nullable |
Other string formats (uuid, uri) fall through to string(). Pydantic validates the format on construction; parquet stores plain text.
Anything not in the table — nested objects, unions other than T | null, etc. — raises ValueError("unmappable JSON Schema: {schema}"). The push fails before bytes leave your machine.
Nullability
A field is nullable iff it's not in the JSON Schema's required set:
pa.field(name, _arrow_type(prop), nullable=name not in required)
Same rule the parquet writer uses.
Use
import pyarrow as pa
import requests
r = requests.get("https://esker.so/archie/us.treasury.yields@2.0.0/schema.arrow")
schema = pa.ipc.read_schema(pa.BufferReader(r.content))
Hand schema to a ParquetWriter, an Arrow Flight reader, anything that wants a typed schema.
TypeScript
schema.d.ts is a single export interface <Name> { ... } declaration. Drop it into a TypeScript project and your records are typed.
Type mapping
| JSON Schema | TS |
|---|---|
null |
null |
string |
string |
integer / number |
number |
boolean |
boolean |
array<inner> |
<inner>[] |
anyOf: [...] |
<a> | <b> | ... |
enum: [v1, v2] |
"v1" | "v2" (JSON-encoded literals) |
Required vs optional
required drives the TS ? marker:
export interface UsTreasuryYields {
quote_date: string;
rate_1m?: number;
rate_3m?: number;
// ...
esker_id: string;
esker_source_url: string;
esker_lineage_id: string;
}
Same rule as Arrow nullability.
Name stripping
The published model class is named Published<X> internally. The TS emitter strips the prefix so consumers see <X> — the domain-native name.
Use
curl -O https://esker.so/archie/us.treasury.yields@2.0.0/schema.d.ts
Drop into your project, import:
import type { UsTreasuryYields } from "./us.treasury.yields";
What lands per push
Six artifacts per esker push:
| artifact | source | mime |
|---|---|---|
data.parquet |
the records | application/vnd.apache.parquet |
schema.json |
Pydantic JSON Schema (sorted-key) | application/json |
schema.arrow |
to_arrow_bytes(model) |
application/vnd.apache.arrow.file |
schema.d.ts |
to_typescript(model) |
text/plain; charset=utf-8 |
lineage.json |
LineageBundle |
application/json |
manifest.json |
DatasetManifest (POSTed) |
application/json |
URL pattern:
GET https://esker.so/<owner>/<name>@<version>/<artifact>
Versionless URLs (<owner>/<name>/data.parquet) resolve to the latest published version.
Programmatic emit
Both functions are part of the public API:
from esker import to_arrow_bytes, to_arrow_schema, to_typescript
arrow_schema = to_arrow_schema(MyPublishedModel)
arrow_bytes = to_arrow_bytes(MyPublishedModel)
ts_text = to_typescript(MyPublishedModel)
MyPublishedModel is MyModel.published() — the variant with the three injected fields. Calling these on the draft variant gives you a schema without esker_id / esker_source_url / esker_lineage_id.
See also
- Manifests — the artifact list lives there too
- Records — how
published()works - Publishing — the upload sequence