What Would Falsify UNI: A Standing List, Universal Natural Intelligence

UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails.

This page names the observations that would force us to revise or retract the hypothesis. It is standing, not retrospective. Every release cycle we update the list, the current status of each falsifier, and the evidence class we can attach to each row. If any single item lands cleanly against us, we will say so on this page before we say it anywhere else.

Ground rules for this list

A falsifier only counts if it is stated in advance, checkable by a third party, and tied to a class of evidence that a reader can inspect. Class E means expert citation, argued against a published source. Class C means configuration and integration state, checkable by reading the repo, the benchmark commit cache, or the running lab. We tag every row so no one has to trust our prose.

Two things this list is not. It is not a promise that UNI will pass every future test. It is not a proof that active inference is the correct theory of mind, of organizations, or of anything else. It is a public record of what would count as counter-evidence, published before the counter-evidence arrives.

The current standing list

1. Precision does not drive the behavioral regimes

The core claim behind the Precision Lab and Loop Lab is that sensory precision, transition precision, and policy temperature produce distinct, reproducible behavioral regimes in a POMDP active-inference agent, consistent with the formulation in Parr, Pezzulo and Friston (2022) Class E. If a careful sweep across those three dials, run on the committed labs, fails to produce distinct regimes, or produces regimes that do not track the bifurcation map shown in the Loop Lab, the claim is falsified as stated. Status: consistent with the sweep cache shipped with the labs Class C, not yet independently replicated.

2. The Cell Lab benchmark does not survive its own pre-registration

The Cell Lab was pre-registered with five claims and their falsification criteria written before the runs. Two of those claims are already partially against us: a neural baseline wins overall on memory_leak and cpu_noisy_neighbor, a rule-based controller wins on database_flaky. If, on a re-run with the committed seeds and depth, UNI drops below random on any disturbance family, or if the paired bootstrap confidence interval for the median difference against random crosses zero for four or more families, the "single active-inference controller carries useful signal" claim is retracted. Status: three losses acknowledged in the published table Class C, no re-run has moved a claim past its stated threshold.

3. The MCP server does not do what it says

The public MCP endpoint at /api/mcp advertises 16 tools split into headless and live-session groups. If an independent operator, using any conformant Model Context Protocol client, cannot reproduce the tool list, cannot run an episode headless, or cannot attach a session and step the agent, the "any LLM can call the lab" claim fails. Status: the endpoint is anonymous, unauthenticated, and version-pinned to the labs Class C.

4. The preprint fails expert review on a load-bearing point

The Zenodo preprint (DOI 10.5281/zenodo.19785799) is marked unrefereed. Layer 2 expert review is pending. If a reviewer with standing in the active-inference literature identifies a load-bearing error in the free-energy derivation, the POMDP formulation, or the mapping from variational free energy to the labs, and that error is not repairable within the framework, we mark the affected section retracted on this page and in the preprint metadata Class E. Status: no such review has landed; absence of review is not absence of error.

5. Autopoiesis language over-reaches

The Cell Lab uses the word autopoiesis in a narrow sense: viable-set maintenance under disturbance, not life. If a reader in good faith reads any UNI page and comes away thinking we are claiming biological life, consciousness, or general intelligence, our honesty fences have failed. That is a copy failure, and copy failures on this project count as falsifiers of the framing, not only the science. Status: fences are stated on the science page and in the paper; we treat any well-argued reader complaint as a bug against the copy.

How this page updates

Each release we do three things. We add any new falsifier the work generates. We update the status line under each existing row with the evidence class that changed. When a row moves against us, it stays on the page, marked, with a link to the observation that moved it. We would rather be correct than impressive.

For the underlying method of how we decide when we are wrong, see the companion post in this cluster, Gates and Falsifiers: How We Know When We Are Wrong. For the benchmark and the paper that anchor rows 1 through 4, see The Benchmark and the Paper: The Stratified Palimpsest. For the full record of what we run, what we log, and how a reader inspects it, see Transparency.

Gates and Falsifiers, the method ›

The decision rules behind this list: what counts as a gate, what counts as a falsifier, and how we act when one trips.

The benchmark and the paper ›

The Stratified Palimpsest anchor for rows 1 through 4: the preprint, the pre-registration, and the committed cache.

Transparency ›

What we log, what we ship, and how any reader can inspect the running system without asking permission.

The Science ›

The paper, the five labs, the pre-registered benchmark, and the MCP server any language model can call.