Research

Notes, papers, and the work in the open.

Each Volume ships with technical notes describing the architecture, training, and results. Open writeups go up as they're ready. The book is the narrative layer; this is the engineering layer.

On what we publish

The substrate is a trade secret. The results, methodology, ablations, and reasoning behind every design decision are open. Other groups should be able to evaluate our claims rigorously, even when they can't reproduce the substrate itself.

If you would benefit from reading a writeup that doesn't yet exist, write to us. The order in which we publish is influenced by who's reading.

What we measure

For each Volume, we report what the model can do, where it fails, and what we would need to know more about. We try to include the failure modes — the binders that didn't bind, the structures the model couldn't predict, the parts of the domain that resisted the architecture. A model is more useful if you know where it doesn't reach.

We do not chase leaderboards. We report on benchmarks when they are the most honest way to communicate a capability, and we describe what we did differently when they aren't. Where the substrate makes a benchmark question malformed, we say so and propose what we think the better question is.

How to read this page

Items marked in preparation are being drafted for release alongside the Volume they belong to. Drafting means we have a working version we are unhappy with. Planned means we have the idea and have committed to it but have not started.

This page will grow. We will not retroactively edit published material; we will append corrections.