Cluster: the benchmark and paper

SeedIQ and ARC-AGI 3: A Third-Party Datapoint

A result outside our lab is worth reporting when it changes the shape of what is defensible. SeedIQ on ARC-AGI 3, run on a laptop, is one of those results. We do not claim SeedIQ is UNI. We treat it as a third-party datapoint in the literature that surrounds our own working hypothesis.

The short version

ARC-AGI 3 is a hard, out-of-distribution reasoning benchmark on which frontier large language models have struggled to move the needle. A small, non-transformer system called SeedIQ posted a result on a MacBook Pro that outperformed the leading LLM baselines Class E. That is a public, independently reported outcome, not a claim by us Class F.

External reference

Themesis: SeedIQ Just Stomped ARC-AGI 3 on a MacBook Pro. In our voice: third-party evidence that non-transformer, active-inference-flavored systems can outperform large language models on hard tasks.

Why this is context, not a claim

UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails.

SeedIQ is not our system. Its internals are not ours to describe. What the SeedIQ result contributes is a public existence proof Class E: a small, non-transformer system, running on consumer hardware, can beat LLM baselines on a benchmark specifically designed to resist rote pattern completion. That is relevant context for anyone judging whether active inference and related generative-model approaches deserve serious attention alongside scaling.

Where our own work sits

Our own instruments are the five POMDP labs, the Cell Lab pre-registered falsification benchmark, and the Zenodo preprint that reviews the free energy principle Class C. The preprint is not yet peer reviewed. The Cell Lab publishes its losses next to its wins. The math anchor remains Parr, Pezzulo, and Friston (2022), MIT Press, Active Inference: The Free Energy Principle in Mind, Brain, and Behavior Class E. We minimize variational free energy, an upper bound on surprise, through perception and action. That is the frame we recognize in the wider non-transformer results, without importing anyone else's claim.

External reference

Themesis: Deep Learning, Transformers, and SeedIQ, Three Industry Breakthroughs. In our voice: a lineage argument that situates SeedIQ next to prior scaling shifts. We use it to place UNI in the same family of questions, not to claim arrival.

How we read a result like this

A single benchmark is not a theory, and a laptop run is not a universal claim. Two disciplined readings are available. First, the SeedIQ number is genuine evidence that the space of viable architectures is wider than transformer-only Class E. Second, that width is exactly what an active-inference program predicts: agents that carry a generative model, act to reduce prediction error, and update Bayesian beliefs over hidden states should generalize past their training distribution better than systems whose competence is defined by their training corpus. We treat both statements as hypotheses to be tested, not conclusions to be broadcast.

What this does not say

It does not say UNI has arrived. It does not say active inference has settled the question of general reasoning. It does not name SeedIQ as part of our project, and it does not import any authorship claim onto our work. It says the outside evidence is moving in a direction that is consistent with the family of ideas we build in.