Gates and falsifiers

How we log honesty in the UNI ledger.

Every public claim UNI makes points to a row in a ledger. If you cannot find the row, the claim is not ours to make. This post shows what a row looks like, how it is evidence-classed, and how a reader can audit any claim we publish.

The ledger is the single source of truth.

UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails. The ledger is how you do that inspection. It is an append-only record of what we observed, what tool produced the observation, when, and how strongly it supports the claim we attach to it.

We designed the ledger to be boring. Boring is the point. A row lands, it is signed, it is timestamped, it carries an evidence class, and it either survives red-team review or it gets a superseded-by pointer to the row that corrected it. Nothing is deleted. If we were wrong on Tuesday, Tuesday's row stays visible and Wednesday's row explains what changed.

What a row looks like.

Here is a lightly-redacted example of one ledger entry, taken from the Cell Lab pre-registered benchmark run. Field names match the on-disk schema (Class B).

entry_id: L-2026-06-14-0417 timestamp: 2026-06-14T04:17:33Z subject: cell-lab, disturbance_family=bad_deploy, depth=2, seeds=6, ticks=80 observation: UNI RecoveryScore median 0.937 vs random 0.621 (95% CI paired diff excludes 0) tool: cell_lab.run_sweep, cache commit a3f2c1d evidence_class: E (test outcome from a pre-registered run) supports_claim: C-UNI-CELL-01 (UNI beats random on bad_deploy) falsifier: if paired median diff CI includes 0 across 3 fresh seeds, C-UNI-CELL-01 fails reviewer: red-team pass 2026-06-16, no objection supersedes: (none) superseded_by: (none)

Read the row top-down. You see the specific claim it supports, the tool that produced the numbers, the evidence class we assigned it, and the exact condition that would kill the claim if it fires. If the falsifier ever triggers on a fresh run, a new row lands, this row gets a superseded_by pointer, and the public page that cited it is updated within the same session.

Evidence classes, in one paragraph.

We tag each row with one of a small set of evidence classes so a reader can weight it correctly. A is empirical-in-session, something we saw happen live and captured. B is code or configuration you can inspect on disk. C is a configuration or integration state, a wiring you can verify. E is a citation to expert literature, for example the Parr, Pezzulo and Friston (2022) treatment of variational free energy on POMDPs (Class E). F means a falsifier is present and named. U means unverified, a claim we want to make but cannot yet ground. A row with only U cannot support a public claim. It can only support the note that we are still looking. The mapping between these tags and how strongly we permit ourselves to speak is covered in Evidence Classes A, E, and What They Mean.

How a public claim traces back.

Every public claim on the site carries an ID like C-UNI-CELL-01. That ID resolves to one or more ledger rows. The row on The Science that says UNI wins bad_deploy against a random controller with a bootstrap confidence interval excluding zero is C-UNI-CELL-01 (Class E). The row that says the preprint is not yet peer reviewed is a status row we refresh whenever the paper's status changes (Class C). The row that says active inference uses variational free energy as an upper bound on surprise cites Parr, Pezzulo and Friston (2022) and carries Class E. If a claim on the site has no ID, or the ID does not resolve, that is a bug and we want to hear about it.

How a reader audits a claim.

Pick any sentence on The Science, note its claim ID, and follow it through /transparency to the ledger row. Read the row. Read the falsifier. If the claim survives the falsifier and the row is not superseded, the sentence stands. If the row is superseded, follow the pointer to the row that replaced it, and the public sentence should already reflect the newer row. If it does not, that is also a bug. The whole point of the ledger is that gaps are visible, not that gaps never happen.

Why we run it this way.

The alternative is a marketing site that gets to define its own truth. That is exactly the extraction pattern we are trying not to become. Running a public ledger means we occasionally have to publish a row that says we were wrong. We would rather do that than be impressive at the cost of being auditable. If you find a claim on any UNI page that does not trace back to a row, tell us. We treat those reports as red-team wins and log them in the same ledger.