Science in the Open: What We Mean
Science in the open is not a marketing posture. It is an operating discipline: publish the claim, publish the class of evidence supporting it, publish the observation that would sink it, and ship the code where the code can be shipped. Everything else is derivative.
What follows is the stance in four practical rules, the cost of holding them, and what they buy the reader who arrives skeptical (which is the reader we want).
Rule 1: every claim carries an evidence class
A claim without a class is a mood. Our classes are narrow and boring on purpose. Class A is empirical, in-session: something we ran, something you can run alongside us. Class B is code or artifact inspection: repository state, config, logs, diffs. Class C is configuration and integration: what is wired to what, at what version, under which flags. Class E is expert citation into the literature, for example Parr, Pezzulo and Friston (2022), Active Inference: The Free Energy Principle in Mind, Brain, and Behavior, MIT Press (Class E), cited never rehosted. Class F flags that a falsifier is present on the claim. Class U marks a claim as unverified and quarantines it from any downstream use.
The classes travel with the claim through the ledger. If a claim is upgraded from Class C to Class A because we ran the thing, the timestamp of the upgrade is in the ledger too. If it is downgraded because a run failed to replicate, that is in the ledger as well.
Rule 2: every claim carries a falsifier
A falsifier is the observation that would kill the claim, written down before we run. On the Cell Lab benchmark the five headline claims each have a pre-registered falsifier attached (Class C). Two of the seven disturbance families currently fail one of the baseline gates, and both losses are recorded on the science page next to the wins. The point of a falsifier is not to be flattering. The point is to be settleable.
Rule 3: ship the code where we can
Five active-inference labs run in the browser with zero backend (Class B, Class C). A public Model Context Protocol server lets any language model call the labs directly, 16 tools, no auth, no token, just the URL (Class C). The preprint is on Zenodo with a DOI, cited as an unrefereed preprint because that is what it is (Class E). Where a component cannot be shipped in the open, patent-grade math that we hold privately, for example, the ledger says so and the class stays honest.
Rule 4: invite challenge
Every gate on this site is an invitation. Move a dial in the Precision Lab and try to reproduce a claimed regime. Read the Zenodo preprint and mail the errata. Call the MCP server from your own LLM client and run a sweep we did not think to run. If you can produce the observation that sits in the falsifier field, the gate closes red and the ledger records it. That is a good day, not a bad one.
What it costs
It costs speed. It is slower to say "UNI beats random in 7 of 7, rule-based in 6 of 7, neural in 5 of 7, and loses on memory_leak, database_flaky, and cpu_noisy_neighbor" than to say "UNI beats the baselines". It costs comfort. A published falsifier is a public commitment to admit a specific way of being wrong, in advance, with a timestamp. It costs some readers, who prefer a program that already knows the answer.
What it buys
It buys the kind of trust that compounds: trust in the method rather than trust in the maker. A reader who has seen us record a loss cleanly has better reason to believe the wins than a reader who has seen the wins alone. It buys correctability. If a claim is wrong, we can find that out, in public, and update. It buys a kind of readership we could not otherwise recruit: skeptics, methodologists, and practitioners who have seen enough marketing prose to distrust anything that does not offer a way to be checked.