Action selection is the point where an active inference agent stops believing and starts moving. If you want to inspect UNI's beliefs about action selection, the honest place to look is not the paper, it is the loop in the workbench that scores each candidate policy and picks one. This post walks that loop at the level a reader inspecting the code needs.
UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Read the loop, inspect the gates, and help us find where it fails.
The shape of the loop
At each tick the agent runs four steps: enumerate short policies over the current planning depth, roll each policy forward under the generative model to get predicted observations and states, score every policy with an expected free energy value G, then sample the first action from a softmax over the negative G vector Class E. The form of G we use follows Parr, Pezzulo and Friston (2022), Active Inference: The Free Energy Principle in Mind, Brain, and Behavior, MIT Press, chapter 7 Class E.
Four seams are worth naming. enumeratePolicies is the
combinatorial cost, it grows as (actions to the power of depth), which is
why the Precision Lab caps depth at 2 by default
Class C. expectedFreeEnergy is
the scoring rule, it decomposes into an epistemic term (expected
information gain about hidden state) and a pragmatic term (KL divergence
from predicted observations to preferred observations)
Class E. The temperature
dial is policy precision, low temperature approaches argmin over G, high
temperature approaches uniform sampling Class E.
And sample is the stochastic seam that keeps behavior from
collapsing to a single-branch trace under near-ties
Class C.
Where the generative model plugs in
Every step above reads from the generative model. Transition matrices B,
observation matrices A, preferences C, and the current posterior over
hidden state Q(s) are passed as arguments, not compiled into the loop
Class C. That separation is the point of
the workbench: swap the model, keep the loop. The Echo Lab and Precision
Lab share this same selectAction, they differ only in the A
matrix (immediate wall sensors versus range-2 echolocation) and in the
preference vector C over goal cells Class C.
Reading the G vector honestly
When you open the workbench overlay, the G vector is displayed per-policy, epistemic and pragmatic terms separated. A high pragmatic contribution with a near-zero epistemic contribution means the agent already knows the world well and is exploiting. A high epistemic contribution means the agent is choosing to probe. Neither is "correct". The behavioral regime you see is a function of your dial settings, not a property of the agent alone Class E.
There are two honesty fences worth noting in the code. First, the agent
never reads the hidden state directly, the loop takes only belief
(the posterior) and model, not ground truth
Class C. Second, the softmax temperature
is a knob, not a claim about biological plausibility. We do not model
neuromodulation. We expose the parameter and let you move it
Class C.
What this is not
This is not a claim that UNI's action selection is the correct model of how brains choose. It is a working, inspectable implementation of the Parr, Pezzulo and Friston (2022) formulation, transparent enough for a reader to disagree with in specific places rather than in general. If you find a place where the code and the textbook diverge, open the file and tell us where. The transparency page collects those reports.