# Arrow & TypeScript artifacts

> How JSON Schema becomes schema.arrow and schema.d.ts on every push.

Every `esker push` uploads two derived schema artifacts alongside the canonical `schema.json`: an Arrow IPC schema (`schema.arrow`) and a TypeScript interface (`schema.d.ts`). Both are pure functions of the published model's JSON Schema — the same input always produces the same bytes.

If you want to consume these artifacts yourself, the rules below tell you what to expect.

## Arrow

`schema.arrow` is the Arrow IPC-serialized `pa.Schema` for the published parquet. The same schema the parquet writer was opened with — so it matches the column types and nullability bit-for-bit.

### Type mapping

| JSON Schema                    | Arrow                        |
| ------------------------------ | ---------------------------- |
| `string`                       | `string()`                   |
| `string` + `format: date-time` | `timestamp('us', tz='UTC')`  |
| `string` + `format: date`      | `date32()`                   |
| `integer`                      | `int64()`                    |
| `number`                       | `float64()`                  |
| `boolean`                      | `bool_()`                    |
| `array<inner>`                 | `list_(<inner>)`             |
| `anyOf: [T, null]`             | `<T>`, field marked nullable |

Other `string` formats (`uuid`, `uri`) fall through to `string()`. Pydantic validates the format on construction; parquet stores plain text.

Anything not in the table — nested objects, unions other than `T | null`, etc. — raises `ValueError("unmappable JSON Schema: {schema}")`. The push fails before bytes leave your machine.

### Nullability

A field is nullable iff it's not in the JSON Schema's `required` set:

```python
pa.field(name, _arrow_type(prop), nullable=name not in required)
```

Same rule the parquet writer uses.

### Use

```python
import pyarrow as pa
import requests

r = requests.get("https://esker.so/archie/us.treasury.yields@2.0.0/schema.arrow")
schema = pa.ipc.read_schema(pa.BufferReader(r.content))
```

Hand `schema` to a `ParquetWriter`, an Arrow Flight reader, anything that wants a typed schema.

## TypeScript

`schema.d.ts` is a single `export interface <Name> { ... }` declaration. Drop it into a TypeScript project and your records are typed.

### Type mapping

| JSON Schema          | TS                                     |
| -------------------- | -------------------------------------- |
| `null`               | `null`                                 |
| `string`             | `string`                               |
| `integer` / `number` | `number`                               |
| `boolean`            | `boolean`                              |
| `array<inner>`       | `<inner>[]`                            |
| `anyOf: [...]`       | `<a> \| <b> \| ...`                    |
| `enum: [v1, v2]`     | `"v1" \| "v2"` (JSON-encoded literals) |

### Required vs optional

`required` drives the TS `?` marker:

```ts
export interface UsTreasuryYields {
  quote_date: string;
  rate_1m?: number;
  rate_3m?: number;
  // ...
  esker_id: string;
  esker_source_url: string;
  esker_lineage_id: string;
}
```

Same rule as Arrow nullability.

### Name stripping

The published model class is named `Published<X>` internally. The TS emitter strips the prefix so consumers see `<X>` — the domain-native name.

### Use

```sh
curl -O https://esker.so/archie/us.treasury.yields@2.0.0/schema.d.ts
```

Drop into your project, import:

```ts
import type { UsTreasuryYields } from "./us.treasury.yields";
```

## What lands per push

Six artifacts per `esker push`:

| artifact        | source                            | mime                                |
| --------------- | --------------------------------- | ----------------------------------- |
| `data.parquet`  | the records                       | `application/vnd.apache.parquet`    |
| `schema.json`   | Pydantic JSON Schema (sorted-key) | `application/json`                  |
| `schema.arrow`  | `to_arrow_bytes(model)`           | `application/vnd.apache.arrow.file` |
| `schema.d.ts`   | `to_typescript(model)`            | `text/plain; charset=utf-8`         |
| `lineage.json`  | `LineageBundle`                   | `application/json`                  |
| `manifest.json` | `DatasetManifest` (POSTed)        | `application/json`                  |

URL pattern:

```
GET https://esker.so/<owner>/<name>@<version>/<artifact>
```

Versionless URLs (`<owner>/<name>/data.parquet`) resolve to the latest published version.

## Programmatic emit

Both functions are part of the public API:

```python
from esker import to_arrow_bytes, to_arrow_schema, to_typescript

arrow_schema = to_arrow_schema(MyPublishedModel)
arrow_bytes = to_arrow_bytes(MyPublishedModel)
ts_text = to_typescript(MyPublishedModel)
```

`MyPublishedModel` is `MyModel.published()` — the variant with the three injected fields. Calling these on the draft variant gives you a schema without `esker_id` / `esker_source_url` / `esker_lineage_id`.

## See also

- [Manifests](https://esker.so/docs/protocol/manifests.md) — the artifact list lives there too
- [Records](https://esker.so/docs/sdk/records.md) — how `published()` works
- [Publishing](https://esker.so/docs/sdk/publishing.md) — the upload sequence