Closed-form Bayesian updates look like magic the first time you meet them. You count, you add, you normalize, and out drops a posterior. The magic is not free. It is a modeling contract about the shape of your uncertainty, and when the contract matches your problem the algebra sings. When it does not, the closed form quietly hides the work you still need to do.
The contract, in one paragraph
A prior is called conjugate to a likelihood when the posterior lives
in the same parametric family as the prior. The Beta is conjugate to the
Bernoulli. The Dirichlet is conjugate to the categorical and the multinomial.
The Gamma is conjugate to the Poisson rate. The Normal with known variance
is conjugate to itself. Parr, Pezzulo and Friston set this out cleanly in
their treatment of the generative model, then use Dirichlet priors over
the categorical parameters of active-inference POMDPs so that learning the
A, B, and D matrices reduces to
incrementing counts Class E (Parr, Pezzulo, Friston,
Active Inference, MIT Press 2022).
Why the algebra pays off
Three concrete wins, all Class C consequences of the same closed-form update Class C:
- Cheap updates. Adding a count is O(1). You do not run a variational optimizer, you do not sample. In the UNI Cell Lab this means the learning loop can tick at the same rate as the control loop without a compute budget conversation.
- Interpretable sufficient statistics. The posterior's parameters are counts (or pseudo-counts). A Dirichlet with concentration
(3, 7, 2)is a claim you can read out loud: three observations of the first outcome, seven of the second, two of the third, plus whatever prior weight you started with. - Analytical KL. KL divergences between two Dirichlets, two Betas, or two Gaussians have closed forms. In an active-inference agent whose expected free energy decomposes into a KL term against a preferred outcome distribution, that closed form is a real speedup, not a rounding.
Where the closed form hides work
The contract is that your uncertainty is Dirichlet-shaped, or Beta-shaped, or Gaussian-shaped. If the world hands you uncertainty of a different shape, the closed form does not warn you. It just gives you a wrong posterior with the right units.
- Multimodality. A Beta posterior is unimodal on the interval. If the true belief has two peaks (two plausible regimes for a coin), a Beta cannot express it. The algebra will still return a number.
- Heavy tails. A Gaussian with known variance is conjugate to itself, elegant and wrong for rare-but-large events. Its posterior gets confident faster than the data warrants.
- Correlated categories. A Dirichlet treats the categorical parameters as independent up to the simplex constraint. If two categories are structurally linked (e.g. "cache down" and "database flaky" tend to co-occur in a service cell), Dirichlet learning smears that structure into two separate counts and loses it.
- Prior-as-alibi. A concentration of
1everywhere is not neutral. It is an assertion that every outcome is equally likely with the weight of one prior observation. In small-sample regimes that assertion dominates the posterior. The closed form will not tell you your prior is doing all the work.
A working test
Before accepting a conjugate prior, ask three questions. Does the family admit the shape of belief you actually hold? Does the sufficient-statistic story match how observations arrive? Would a mixture, a hierarchical model, or a non-parametric prior express something the closed form cannot? If any answer is no, the closed form is fine as a first pass and a liability as a final answer Class C.
How UNI uses this in practice
In the UNI POMDP labs the categorical parameters of the generative model carry Dirichlet priors, and learning is incremental count updates against observed outcomes Class C. That choice is deliberate: the agent's belief about state transitions and observation likelihoods is genuinely categorical, the Dirichlet shape is a fair match, and the closed form keeps the labs runnable in a browser tab with no backend. The Cell Lab benchmark shows the resulting controller does well in most disturbance families and loses in three, which is exactly the kind of loss you would expect a unimodal count-based posterior to take when the world briefly changes regime. The failure is legible because the prior is legible. That is the payoff of the contract, not a workaround for it.
For the same reason we do not reach for conjugacy inside the higher-level planning objective, where preferences over outcome distributions are more naturally written as targets in the same simplex the model already lives in, and where KL divergences enter the expected free energy directly. UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails.