The 2026 landscape looks different from where it stood a year ago. The frontier labs are quieter about the finish line, the benchmarks are more honest about what they measure, and the acronym at the center of the whole conversation has started to feel like a coat that no longer fits. This is a reading of that shift from where UNI stands, and an argument for a different frame.
What actually changed
Three things moved. Model scale kept climbing, but the marginal gain on general reasoning kept shrinking, and the community started publishing losses next to wins instead of only the wins (Class C, based on the published benchmark and preprint conventions we see landing in 2026). The ARC-AGI family got harder and stayed unsolved by the crowd-favorite architectures. And the vocabulary itself began to fracture: a growing number of researchers now write "general intelligence" only inside careful definitional fences, and the phrase that used to mean "the goal" now generates a paragraph of caveats before it is used. That is progress.
The interesting move is not that any single system got closer to being universally competent. The interesting move is that the field started asking a different question: what would it take to build a system that predicts, senses, acts, and stays inside its own viable envelope, not just on a leaderboard but under disturbance in the open (Class E, expert citation, this is the active-inference program articulated in Parr, Pezzulo and Friston, 2022).
Why "artificial general intelligence" strains under load
The word "artificial" was always doing two jobs at once. It flagged that the substrate was engineered, and it implied that the intelligence itself was separate from the biological family that produced language, planning, and care. In 2026 both of those jobs are getting harder. The engineered substrate is now entangled with real economies, real electricity, real biology through the humans in the loop, and real ecosystems downstream. And the intelligence, insofar as any of the current systems have any, is a reflection of natural intelligences pressed through parameters. Calling that "artificial" reads less like a technical description and more like a marketing choice.
"General" strains for a different reason. The measurements that would justify the word require behavior across many environments, many time scales, and many disturbance families, held to a viable set. Almost no benchmark in wide use tests that. The Cell Lab benchmark on this site tests a narrow slice of it and shows both wins and losses (Class C, from the pre-registered results published on the Science page). A single controller is not universally best. That is the honest reading of the data, and it is exactly the reading that the word "general" tends to obscure.
The frame we use instead
UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails.
"Natural" is the load-bearing word. It says the intelligence we are reaching for is continuous with the family of intelligences already in the world, and it commits us to a math that runs on prediction, sensing, and action, not on scale for its own sake. It also commits us to publishing losses. The Cell Lab loses three of seven disturbance families, in the open, with the numbers on the page.
Themesis on the same landscape shift
What to do with all this
Read the labs. Move the dials. Run the benchmark against your own controller if you have one, and tell us where UNI loses to something simpler. That is the whole point of building in the open: the frame is only worth what its receipts can defend, and the receipts get better every time somebody outside the project pushes on them.