An active-inference agent does two things at once: it infers what is happening (Bayesian update on beliefs), and it acts to make what happens match what it prefers (policy selection). KL divergence is the ruler both steps use. This post is a pillar: it names the pieces, shows how they fit, and points at where you can watch them run.
UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails.
Two objects, one ruler
Bayesian inference asks: given a generative model of the world, and given
an observation, what should I believe about the hidden state? The exact
posterior is often intractable. Variational inference replaces the exact
posterior with an approximate distribution Q(s) chosen from a
family that is easy to compute, and picks the Q that is
closest to the true posterior P(s | o) in the sense of
Kullback-Leibler divergence Class E (Parr,
Pezzulo and Friston, 2022, chapters 2 and 4).
KL divergence D_KL[Q || P] is a non-symmetric, non-negative
quantity in nats. It is zero exactly when Q equals
P, and it grows as they disagree. It is not a metric in the
geometric sense, but it is the right ruler here because it drops out of
the correct decomposition of the free energy functional
Class E.
Variational free energy: an upper bound on surprise
Surprise, in the technical sense, is the negative log evidence of the
observation, -log P(o). An organism cannot compute it
directly, because P(o) requires marginalising the generative
model over every possible hidden state. Variational free energy
F(Q, o) gives a computable substitute. It can be written
(conceptually, without proprietary equations) as:
- Complexity term:
D_KL[Q(s) || P(s)], the KL divergence between the approximate posterior and the prior. This is the belief update cost. - Accuracy term:
-E_Q[log P(o | s)], the expected log-likelihood of the observation under current beliefs. This is the "fit" cost.
Two facts hold at once: F is always greater than or equal to
surprise, and F equals surprise plus the KL divergence between
Q(s) and the true posterior P(s | o). So
minimising F with respect to Q pushes
Q toward the true posterior. That is the sense in which
minimising variational free energy is approximate Bayesian inference,
equivalent in the limit of an unrestricted Q-family
Class E (Parr et al., 2022, chapter 4).
A useful gut check: the complexity term punishes moving away from what you already believe, the accuracy term punishes not fitting what you just saw. Perception is the compromise. The agent believes the smallest new thing that still explains the data.
Expected free energy: risk plus ambiguity
Perception uses variational free energy. Action uses a forward-looking
cousin called expected free energy G(pi) over policies
pi. For each candidate policy, the agent imagines the
trajectory of observations and hidden states it would produce, then scores
that trajectory with two terms:
- Risk (extrinsic value): the KL divergence between predicted future observations under the policy and the agent's preferred observations. This is "how far the outcome would be from what I want."
- Ambiguity (epistemic value): the expected uncertainty that remains about hidden states given the predicted observations. This is "how much I would still not know."
The agent picks policies with low expected free energy, which means low risk and low ambiguity together Class E (Parr et al., 2022, chapter 7). This is why an active-inference agent naturally balances exploration and exploitation: the ambiguity term rewards actions that reveal information, the risk term rewards actions that reach preferred outcomes. There is no bolt-on exploration bonus. It falls out of the decomposition. See our companion piece on expected free energy, risk plus ambiguity for the operational walkthrough.
What KL divergence actually does inside UNI
The Precision Lab, Echo Lab, Loop Lab, Heart Lab, and Cell Lab all run the same active-inference core in the browser. Inspection of that core Class B shows KL divergence appearing at three named places:
- Inside the perceptual update, as the complexity term between the posterior over hidden states and the prior propagated from the last tick.
- Inside policy scoring, as the risk term between predicted observation distributions and the agent's preference vector
C. - Inside the precision update, as the driver that reweights how much the agent trusts its sensory likelihood versus its transition model. This is the same "precision" dial exposed on the Precision Lab.
The point of exposing dials is honesty: if you move sensory precision up and behavior changes as the theory predicts, the theory has passed a small test in front of you. If it does not, the theory is wrong in a way you can see. That is the falsifier posture Class F. For a fuller conceptual pass on the inference machinery, see variational inference, a conceptual walkthrough.
Why this is not "avoiding novelty"
A common misreading of the free energy principle is that agents that minimise surprise must therefore hide in dark rooms. The math says the opposite: because expected free energy contains the ambiguity term, an agent that only minimises risk will accept high ambiguity and pay for it later. Long-horizon minimisation of expected free energy rewards information-seeking now, in service of lower total surprise across the life of the policy Class E. See why minimizing surprise is not avoiding novelty for the extended argument.
The generative model is where the assumptions live
Everything above rests on a generative model: what states the agent represents, what observations it expects from each state, what transitions it predicts under each action, and what outcomes it prefers. Change the generative model and you change what "surprise" even means. The model is not neutral. It is the theory of the world the agent is carrying. Our companion piece on generative models, the organism model of its world unpacks how UNI chooses these pieces and where the modelling honesty fences sit.
How we know when we are wrong
Public gates matter more than eloquence. The Cell Lab is a pre-registered falsification benchmark: five claims and their falsification criteria were written before the runs, and losses are recorded as plainly as wins Class F. If the active-inference controller fails an unseen disturbance family that was in scope, the failure is on the page, not smoothed over. Read gates and falsifiers, how we know when we are wrong for the pattern we hold ourselves to.
Where the math goes next, if you want it hands-on
Two external resources sit well beside this pillar. Themesis offers a short primer, T3, Top Ten Terms in Statistical Mechanics for AI. We recommend it as prep for the UNI Workshop for math-hungry learners (linked as a factual companion, not an endorsement in either direction). Themesis also runs a hands-on course, Building Active Inference in Python, which is a complementary Python path, a different stack than our Elixir workbench, useful if you want a second implementation to triangulate against ours.
What this post is not
It is not a proof, it is not a clinical instrument, and it is not the claim that our system has general intelligence. It is a pillar overview, citing published work and showing the pieces we compute, so that a careful reader can move to the labs and interrogate the claims one at a time. The Zenodo preprint is unrefereed. Behavioral labels are computational phenotypes, hypotheses, not diagnoses.