Cluster: expected-free-energy

Planning as Inference: What That Phrase Means

One slogan, one small shift in the generative model, and planning becomes the same kind of computation as perception. Here is what the phrase actually says, and what it does not.

In active inference the same variational machine that answers "what is the world doing" is asked to answer "what should I do next." That is the whole trick behind the phrase planning as inference. This post unpacks the shift, keeps the math conceptual, and points at where the receipts live.

The shift in the generative model

Perception in a POMDP-style active inference agent minimizes a variational free energy bound on log evidence, giving an approximate posterior over hidden states given observations Class E (Parr, Pezzulo, Friston, 2022). Nothing exotic yet: it is Bayesian inference done tractably with a chosen family of approximating distributions.

Planning as inference makes one small structural change. Policies, meaning sequences of intended actions, are added to the generative model as random variables. Preferred observations get encoded as a prior over what the agent expects to see in the future. Now the same inference machinery that reasons about hidden states can reason about which policy is most consistent with the model that predicts the preferred future Class E (Parr, Pezzulo, Friston, 2022, Ch. 7).

Perception asks: given what I have seen, what state am I in. Planning asks: given the outcomes I prefer, which sequence of actions makes those outcomes the most likely evidence to explain. Same math, different query.

Why this is not a rebrand of reinforcement learning

In reward-driven control, the agent maximizes an external scalar. In planning as inference the agent minimizes expected free energy over policies, a quantity that decomposes into a pragmatic term (how likely a policy makes the preferred observations) and an epistemic term (how much a policy resolves uncertainty about hidden states) Class E (Parr, Pezzulo, Friston, 2022). Curiosity is not bolted on. It falls out of the same objective, because reducing uncertainty about the model is inference, and inference is what the agent is already doing.

The practical consequence is that policy scoring, exploration, and exploitation share a single ledger. You do not need a separate exploration bonus to explain why an active inference agent probes an ambiguous corner of its environment before committing.

What the labs show, and what they do not

The Precision Lab on this site implements a discrete-time POMDP with policies scored by expected free energy at planning depth 1 or 2 Class C. Moving the sensory precision, transition precision, and policy temperature dials produces qualitatively distinct regimes: cautious probing, decisive commitment, or stalling. That is a demonstration of the mechanism, not a claim that this is the exclusive route to those behaviors. Falsifiers live in the Cell Lab, where a pre-registered benchmark records where the approach loses as plainly as where it wins Class C.

What planning as inference does not claim

It does not claim that biological brains literally compute expected free energy over enumerated policies. It does not claim that the free energy in question is a thermodynamic quantity: it is the variational free energy of inference, measured in nats. It does not claim that any single scoring rule is universally best. UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails.

Further reading, across the family: Themesis, "Where to Start with Active Inference, A Resource Map for 2026". Our honest one-line frame: an external resource map that lists SolutionWright among five entry pathways into active inference, useful as an orientation anchor when readers arrive from other traditions.