Action Selection in the UNI Workbench, Universal Natural Intelligence

Action selection is the point where an active inference agent stops believing and starts moving. If you want to inspect UNI's beliefs about action selection, the honest place to look is not the paper, it is the loop in the workbench that scores each candidate policy and picks one. This post walks that loop at the level a reader inspecting the code needs.

UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Read the loop, inspect the gates, and help us find where it fails.

The shape of the loop

At each tick the agent runs four steps: enumerate short policies over the current planning depth, roll each policy forward under the generative model to get predicted observations and states, score every policy with an expected free energy value G, then sample the first action from a softmax over the negative G vector Class E. The form of G we use follows Parr, Pezzulo and Friston (2022), Active Inference: The Free Energy Principle in Mind, Brain, and Behavior, MIT Press, chapter 7 Class E.

// simplified, from workbench/agent/action_select.js
function selectAction(belief, model, prefs, depth, temperature) {
  const policies = enumeratePolicies(model.actions, depth);
  const G = policies.map(pi => expectedFreeEnergy(pi, belief, model, prefs));
  const logits = G.map(g => -g / temperature);
  const probs  = softmax(logits);
  const chosen = sample(policies, probs);
  return { firstAction: chosen[0], policies, G, probs };
}

Four seams are worth naming. enumeratePolicies is the combinatorial cost, it grows as (actions to the power of depth), which is why the Precision Lab caps depth at 2 by default Class C. expectedFreeEnergy is the scoring rule, it decomposes into an epistemic term (expected information gain about hidden state) and a pragmatic term (KL divergence from predicted observations to preferred observations) Class E. The temperature dial is policy precision, low temperature approaches argmin over G, high temperature approaches uniform sampling Class E. And sample is the stochastic seam that keeps behavior from collapsing to a single-branch trace under near-ties Class C.

Where the generative model plugs in

Every step above reads from the generative model. Transition matrices B, observation matrices A, preferences C, and the current posterior over hidden state Q(s) are passed as arguments, not compiled into the loop Class C. That separation is the point of the workbench: swap the model, keep the loop. The Echo Lab and Precision Lab share this same selectAction, they differ only in the A matrix (immediate wall sensors versus range-2 echolocation) and in the preference vector C over goal cells Class C.

Reading the G vector honestly

When you open the workbench overlay, the G vector is displayed per-policy, epistemic and pragmatic terms separated. A high pragmatic contribution with a near-zero epistemic contribution means the agent already knows the world well and is exploiting. A high epistemic contribution means the agent is choosing to probe. Neither is "correct". The behavioral regime you see is a function of your dial settings, not a property of the agent alone Class E.

There are two honesty fences worth noting in the code. First, the agent never reads the hidden state directly, the loop takes only belief (the posterior) and model, not ground truth Class C. Second, the softmax temperature is a knob, not a claim about biological plausibility. We do not model neuromodulation. We expose the parameter and let you move it Class C.

What this is not

This is not a claim that UNI's action selection is the correct model of how brains choose. It is a working, inspectable implementation of the Parr, Pezzulo and Friston (2022) formulation, transparent enough for a reader to disagree with in specific places rather than in general. If you find a place where the code and the textbook diverge, open the file and tell us where. The transparency page collects those reports.

Expected free energy and goal-directed action ›

The math behind the G score, epistemic and pragmatic terms.

Encoding a generative model in Elixir ›

How A, B, C, and D matrices are represented on the server side.

Transparency ›

Honesty fences, evidence classes, and how to report a divergence.