Exact Bayesian inference is often uncomputable in the models an agent actually cares about. The ELBO is the trick that turns that computational dead end into a working optimisation problem. It is the same object active inference calls variational free energy, wearing a different coat.
UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, help us find where it fails.
What the ELBO actually bounds
Given a generative model P(o, s) over observations
o and hidden states s, the log-evidence
log P(o) is what a Bayesian agent would ideally compute. To
get it you have to marginalise the joint over every possible hidden state,
which is often intractable for realistic state spaces (Class E, Parr,
Pezzulo and Friston, 2022, chapter 4) Class E.
Variational inference sidesteps this by picking an approximate posterior
Q(s) from a tractable family and writing:
log P(o) = ELBO(Q) + D_KL[Q(s) || P(s | o)]
The KL term is non-negative, so ELBO(Q) is a lower bound on
log P(o). Maximising the ELBO over Q does two
things at once: it tightens the bound, and it pulls Q(s)
toward the true posterior P(s | o). That is the whole game
(Class E, standard result in variational Bayes)
Class E.
Read the identity carefully: the ELBO is not an approximation of the log-evidence, it is a bound. When the approximate posterior matches the true posterior, the bound is tight and the ELBO equals the log-evidence exactly.
The sign flip: ELBO and variational free energy
Active inference works with variational free energy F(Q, o)
instead of the ELBO. The relationship is a sign flip:
F(Q, o) = -ELBO(Q)
Maximising the ELBO is minimising variational free energy. Because
F is always greater than or equal to surprise
(-log P(o)), minimising F tightens an upper
bound on surprise, the mirror image of tightening a lower bound on
log-evidence (Class E, Parr et al., 2022, chapter 2)
Class E. The literature uses both framings.
Machine learning tends to speak ELBO; active inference tends to speak free
energy. The math is the same object.
Why the bound matters for tractability
Two things become possible once you have the ELBO in hand. First, you can
optimise Q by gradient descent inside a chosen variational
family (mean-field, structured, or amortised), instead of trying to
integrate an intractable joint. Second, you can stop early: any
Q gives you a valid bound, so partial optimisation still
yields a usable estimate of belief, just a looser one. Perception under
time pressure is exactly this: a partial ascent up the ELBO, cashed in as
an approximate posterior (Class E, general variational Bayes)
Class E. For a fuller conceptual pass on the
inference machinery, see
variational inference, a conceptual walkthrough.
Two decompositions, one identity
The ELBO has two decompositions used constantly in the active-inference literature. The first is the accuracy-complexity decomposition:
ELBO(Q) = E_Q[log P(o | s)] - D_KL[Q(s) || P(s)]- Accuracy: expected log-likelihood of the observation under current beliefs.
- Complexity: KL divergence between the approximate posterior and the prior, the cost of updating beliefs away from where they were.
The second is the evidence-KL decomposition already given above, which is the one that establishes the bound. Both drop out of the same identity by rearranging the log-joint. If either derivation feels shaky, slow down on what KL divergence actually measures before pushing further.
What the ELBO looks like inside UNI
The UNI labs run a discrete-time POMDP active-inference core in the browser. Static inspection of that core Class C shows the ELBO appearing as variational free energy at each perceptual tick: an accuracy term against the current observation, and a complexity term against the transition-propagated prior from the previous tick. The precision dials on the Precision Lab modulate how much weight the accuracy term carries relative to the complexity term. That is a behavior you can watch: the same observation, at different sensory precisions, yields different posteriors because the ELBO is being ascended on a differently weighted surface. Falsifier posture applies Class F: if moving precision does not shift behavior as the decomposition predicts, the implementation or the theory is wrong in a way you can see. See our companion piece on generative models, the organism model of its world for how UNI chooses the model pieces the ELBO is bounding.
What this post is not
It is not a derivation, it is not a clinical instrument, and it is not the claim that our system has general intelligence. It is a conceptual map of one identity, cited to Parr, Pezzulo and Friston (2022), grounded in code inspection of the UNI core. The Zenodo preprint is unrefereed. Behavioral labels in the labs are hypotheses, not diagnoses.