KL Divergence and Bayesian Inference in Active Inference, Universal Natural Intelligence

By Michael Polzin, 1 July 2026. Evidence classes present: E, B, F

An active-inference agent does two things at once: it infers what is happening (Bayesian update on beliefs), and it acts to make what happens match what it prefers (policy selection). KL divergence is the ruler both steps use. This post is a pillar: it names the pieces, shows how they fit, and points at where you can watch them run.

UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails.

Two objects, one ruler

Bayesian inference asks: given a generative model of the world, and given an observation, what should I believe about the hidden state? The exact posterior is often intractable. Variational inference replaces the exact posterior with an approximate distribution Q(s) chosen from a family that is easy to compute, and picks the Q that is closest to the true posterior P(s | o) in the sense of Kullback-Leibler divergence Class E (Parr, Pezzulo and Friston, 2022, chapters 2 and 4).

KL divergence D_KL[Q || P] is a non-symmetric, non-negative quantity in nats. It is zero exactly when Q equals P, and it grows as they disagree. It is not a metric in the geometric sense, but it is the right ruler here because it drops out of the correct decomposition of the free energy functional Class E.

Variational free energy: an upper bound on surprise

Surprise, in the technical sense, is the negative log evidence of the observation, -log P(o). An organism cannot compute it directly, because P(o) requires marginalising the generative model over every possible hidden state. Variational free energy F(Q, o) gives a computable substitute. It can be written (conceptually, without proprietary equations) as:

Complexity term: D_KL[Q(s) || P(s)], the KL divergence between the approximate posterior and the prior. This is the belief update cost.
Accuracy term: -E_Q[log P(o | s)], the expected log-likelihood of the observation under current beliefs. This is the "fit" cost.

Two facts hold at once: F is always greater than or equal to surprise, and F equals surprise plus the KL divergence between Q(s) and the true posterior P(s | o). So minimising F with respect to Q pushes Q toward the true posterior. That is the sense in which minimising variational free energy is approximate Bayesian inference, equivalent in the limit of an unrestricted Q-family Class E (Parr et al., 2022, chapter 4).

A useful gut check: the complexity term punishes moving away from what you already believe, the accuracy term punishes not fitting what you just saw. Perception is the compromise. The agent believes the smallest new thing that still explains the data.

Expected free energy: risk plus ambiguity

Perception uses variational free energy. Action uses a forward-looking cousin called expected free energy G(pi) over policies pi. For each candidate policy, the agent imagines the trajectory of observations and hidden states it would produce, then scores that trajectory with two terms:

Risk (extrinsic value): the KL divergence between predicted future observations under the policy and the agent's preferred observations. This is "how far the outcome would be from what I want."
Ambiguity (epistemic value): the expected uncertainty that remains about hidden states given the predicted observations. This is "how much I would still not know."

The agent picks policies with low expected free energy, which means low risk and low ambiguity together Class E (Parr et al., 2022, chapter 7). This is why an active-inference agent naturally balances exploration and exploitation: the ambiguity term rewards actions that reveal information, the risk term rewards actions that reach preferred outcomes. There is no bolt-on exploration bonus. It falls out of the decomposition. See our companion piece on expected free energy, risk plus ambiguity for the operational walkthrough.

What KL divergence actually does inside UNI

The Precision Lab, Echo Lab, Loop Lab, Heart Lab, and Cell Lab all run the same active-inference core in the browser. Inspection of that core Class B shows KL divergence appearing at three named places:

Inside the perceptual update, as the complexity term between the posterior over hidden states and the prior propagated from the last tick.
Inside policy scoring, as the risk term between predicted observation distributions and the agent's preference vector C.
Inside the precision update, as the driver that reweights how much the agent trusts its sensory likelihood versus its transition model. This is the same "precision" dial exposed on the Precision Lab.

The point of exposing dials is honesty: if you move sensory precision up and behavior changes as the theory predicts, the theory has passed a small test in front of you. If it does not, the theory is wrong in a way you can see. That is the falsifier posture Class F. For a fuller conceptual pass on the inference machinery, see variational inference, a conceptual walkthrough.

Why this is not "avoiding novelty"

A common misreading of the free energy principle is that agents that minimise surprise must therefore hide in dark rooms. The math says the opposite: because expected free energy contains the ambiguity term, an agent that only minimises risk will accept high ambiguity and pay for it later. Long-horizon minimisation of expected free energy rewards information-seeking now, in service of lower total surprise across the life of the policy Class E. See why minimizing surprise is not avoiding novelty for the extended argument.

The generative model is where the assumptions live

Everything above rests on a generative model: what states the agent represents, what observations it expects from each state, what transitions it predicts under each action, and what outcomes it prefers. Change the generative model and you change what "surprise" even means. The model is not neutral. It is the theory of the world the agent is carrying. Our companion piece on generative models, the organism model of its world unpacks how UNI chooses these pieces and where the modelling honesty fences sit.

How we know when we are wrong

Public gates matter more than eloquence. The Cell Lab is a pre-registered falsification benchmark: five claims and their falsification criteria were written before the runs, and losses are recorded as plainly as wins Class F. If the active-inference controller fails an unseen disturbance family that was in scope, the failure is on the page, not smoothed over. Read gates and falsifiers, how we know when we are wrong for the pattern we hold ourselves to.

Where the math goes next, if you want it hands-on

Two external resources sit well beside this pillar. Themesis offers a short primer, T3, Top Ten Terms in Statistical Mechanics for AI. We recommend it as prep for the UNI Workshop for math-hungry learners (linked as a factual companion, not an endorsement in either direction). Themesis also runs a hands-on course, Building Active Inference in Python, which is a complementary Python path, a different stack than our Elixir workbench, useful if you want a second implementation to triangulate against ours.

What this post is not

It is not a proof, it is not a clinical instrument, and it is not the claim that our system has general intelligence. It is a pillar overview, citing published work and showing the pieces we compute, so that a careful reader can move to the labs and interrogate the claims one at a time. The Zenodo preprint is unrefereed. Behavioral labels are computational phenotypes, hypotheses, not diagnoses.

The Workshop ›

The tightly qualified, publish-gate-backed working session where these pieces are taught and stress-tested.

Generative models ›

Where the assumptions live: states, observations, transitions, and preferences, made explicit.

Expected free energy ›

The forward-looking scoring rule that balances reaching preferred outcomes with reducing uncertainty.

Gates and falsifiers ›

The pre-registered falsification benchmark and the criteria written before the runs.