Cluster, KL divergence and Bayesian inference

Why minimizing surprise is not avoiding novelty.

The first time somebody meets the phrase "minimize free energy," a reasonable objection lands within a minute. If the agent is built to reduce surprise, why doesn't it just sit in a dark room, close its eyes, and refuse to move? The dark-room objection is old, and the reply is not a rhetorical trick. It follows from how the math is set up.

UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails.

Two free energies, not one

Active inference uses two related quantities, and conflating them is where the dark-room confusion comes from Class E. Variational free energy (F) scores the current model against current observations. It is a bound on surprise about what has already been sensed. Minimizing F is a fitting problem: match the beliefs to the evidence that just arrived. That is perception and learning, not action.

Expected free energy (G), by contrast, scores candidate policies against a preferred-outcome distribution and against how much they will teach the agent. G looks forward, over sequences of future actions and the future observations those actions would produce. Action selection in active inference is the argmin over G, not over F (Parr, Pezzulo and Friston, 2022) Class E. The two objectives answer different questions, and the dark-room objection quietly assumes only F exists.

The two terms inside expected free energy

G decomposes into a pragmatic term and an epistemic term Class E. The pragmatic term is the expected divergence between predicted outcomes and preferred outcomes: a KL divergence between what the policy is likely to produce and the distribution the agent is built to prefer. This is the goal-directed pressure, the part that says a hungry agent prefers food-shaped futures.

The epistemic term is the expected information gain about hidden states that the policy would yield. It is negative, so minimizing G means maximizing expected information gain. Uninformative futures are penalized. A policy that would leave the agent guessing scores worse than a policy that would resolve the guess, all else equal.

A dark room satisfies the pragmatic term poorly (no food, no water, no social contact are in the preference distribution), and it satisfies the epistemic term catastrophically (nothing to learn). The dark-room policy is not a minimizer of G. It is close to a maximizer.

What this looks like in a lab

In our Precision Lab the agent has three dials: sensory precision, transition precision, and policy temperature Class C. Raise policy temperature and the pragmatic term loosens: the agent wanders more, sampling states it did not need to sample to eat. Lower it and the pragmatic pull dominates. Neither extreme is "correct" for all environments. The regime is what matters, and precision is the upstream variable. Novelty is not a bug the agent is trying to escape. It is a quantity the agent trades off against goal satisfaction, tick by tick.

Perception minimizes variational free energy over the past. Action minimizes expected free energy over the future. The epistemic term inside expected free energy is why a healthy agent leaves the dark room.

Where this fits in the family map

Themesis published a resource map for people finding active inference in 2026 that lists SolutionWright among five pathways in Where to Start with Active Inference, A Resource Map for 2026. Our one-line frame on that link, in our voice: it is the map that catalogs the current entry points, and we are one of them. We link to it, we do not paraphrase it, and inclusion in a resource map is a factual listing, not an endorsement.

What this post does not claim

This post explains a design choice inside a model of cognition. It does not claim UNI has general intelligence, and it does not claim the free energy principle is settled science. The preprint on the science page is unrefereed Class C. The epistemic story above is textbook active inference, cited to Parr, Pezzulo and Friston, and the lab implementation is inspectable in the browser Class C. If any of the three dials behave differently than described when you move them, that is a falsifier we want to see.