Epistemic vs Pragmatic Value

Michael Polzin · 2026-07-01 · Working hypothesis on the attainable path toward General Natural Intelligence, natural not artificial.

A learning agent has two things it can want from the next action: to find out what it does not yet know, and to bring about what it prefers. Expected free energy makes that split formal. It is the single quantity an active-inference agent minimizes over policies, and it decomposes cleanly into an information term and a preference term.

In the Parr, Pezzulo and Friston formulation Class E, the expected free energy of a policy pi, denoted G(pi), is written as the expected divergence between the predicted posterior over hidden states and the prior preferences over outcomes, minus the expected information gain about hidden states from future observations. The convention that matters here is minimization: the agent picks the policy with the lowest G. Rewriting the decomposition with signs in the direction the agent is trying to reduce:

G(pi) = pragmatic_cost(pi) − epistemic_value(pi) pragmatic_cost(pi) = E_q[ ln q(o | pi) − ln p(o | C) ] (KL between predicted outcomes and preferred outcomes) epistemic_value(pi) = E_q[ ln q(s | o, pi) − ln q(s | pi) ] (expected information gain about hidden states)

Read that as two forces pulling on the same policy. The pragmatic term punishes policies whose predicted outcome distribution q(o | pi) drifts from the agent's prior preferences p(o | C). The epistemic term rewards policies whose future observations are expected to sharpen the posterior over hidden states, that is, actions that will teach the agent something. Minimizing G means the agent is simultaneously trying to bring the world toward its preferences and trying to reduce its own uncertainty about the world. Class E

Why the split is the design surface

In a bandit or reinforcement-learning frame, exploration versus exploitation is usually a bolt-on: an epsilon parameter, a softmax temperature, a Thompson-sampling posterior. In active inference the split falls out of the same objective the agent is already optimizing Class E. There is no separate exploration knob. If the posterior q(s | pi) over hidden states is already sharp, the epistemic term goes to zero and pragmatic cost dominates, so the agent exploits. If the posterior is broad, epistemic value dominates and the agent probes.

This is what makes the split useful when we build the labs. In the Precision Lab, raising sensory precision compresses q(s | o, pi), which shrinks the epistemic term for most policies. The agent stops wandering and heads for goal cells. Lower the precision and the opposite happens: broad posteriors, high expected information gain from many policies, and the agent looks curious. Same equation, same agent, different dial. Class C

Honesty fence

"Expected free energy" here is the variational free energy of inference, in nats, evaluated over predicted futures. It is not a thermodynamic quantity, and calling one term "epistemic" does not mean the agent knows it is exploring. The label describes the mathematical role of the term, not a subjective state.

The failure modes are the interesting part

Two failure modes fall directly out of the decomposition, and both are visible in the labs.

First, if p(o | C) is too narrow (the agent "prefers" too specific an outcome), pragmatic cost dominates from the start, epistemic value gets crowded out, and the agent commits early to a policy it does not yet have evidence for. It looks decisive and is often wrong. Second, if the generative model over hidden states is miscalibrated so that q(s | o, pi) never sharpens no matter how many observations arrive, the epistemic term never falls, and the agent probes forever without settling. It looks curious and never delivers. Class C

Both modes are also useful diagnostics for something that is not an active-inference agent. When a team keeps re-scoping a decision under the label of "gathering more data" and never converges, that is the second failure mode, in a form you can name.

Themesis on the same idea

AJ Maren has a video walking through why the epistemic-pragmatic split matters as machine-learning history rhymes forward through transformers into active inference. We link it as a companion, in her voice: Deep Learning Did It. Transformers Did It. Active Inference Just Did It Again (Part 1). Our one-line frame, in our voice: a public explainer we found useful when introducing the two-term decomposition to people who already know gradient descent but have not met free-energy minimization. This is a factual reference, not an endorsement in either direction.

What to read next

Expected Free Energy and Goal-Directed Action ›

How the same G(pi) gives rise to action selection that looks goal-directed without a separate reward signal.

Why Minimizing Surprise Is Not Avoiding Novelty ›

The most common misread of the free-energy principle, and why epistemic value is the reason curiosity fits.

KL Divergence and Bayesian Inference in Active Inference ›

The KL term at the heart of the pragmatic cost, and how variational inference makes it tractable.

The Workshop ›

Where these ideas get taken from equation to running system, tightly qualified and evidence-classed.

UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails.