Generative models

Prior Preferences and Goal-Directed Behavior

By Michael Polzin, 2026-07-01. Reading time about 5 minutes. Working hypothesis, on the attainable path toward General Natural Intelligence, natural not artificial.

In active inference, an agent does not chase a reward. It expects to see certain observations, and then acts to bring those observations about. The engine that steers behavior is a distribution called the prior preference, usually written P(o | C), where o is a sensory outcome and C is the agent's characteristic "what it wants to see" bias. Change C, and the same generative model produces a different life.

Preferences live over observations, not states.

A key move in the Parr, Pezzulo and Friston (2022) formulation is that the preference distribution is defined over observations, not over hidden states Class E. The agent scores futures by how well the observations it expects along a policy match the observations it prefers, measured by the Kullback-Leibler divergence between the predicted posterior over outcomes and the preference distribution C. That divergence is the pragmatic (goal-seeking) term inside expected free energy. The complementary term is epistemic, the drive to reduce uncertainty about hidden states.

Concretely, an active-inference agent evaluates a policy pi by computing, for each future time step, both what it expects to observe under pi and how far that expectation sits from C. Policies whose predicted observations are closer to C, and whose predicted observations also resolve uncertainty about the hidden model, are more probable. There is no separate reward channel added on top. Preference over observations does all the goal work.

Contrast with reward signals.

In a standard reinforcement-learning agent, a scalar reward r(s, a) is delivered from outside the model, and the policy tries to maximize the sum of discounted rewards. Behavior is goal-directed only because the reward has been shaped to point at the goal. In active inference, the same steering job is done differently: preferences are part of the generative model itself, expressed as log-probabilities over observations Class E. A "goal" is just an observation you strongly expect to see. The agent then infers the actions that make that expected observation likely.

This has a practical consequence. If you want to change what an active-inference agent pursues, you edit C. You do not retrain a value function. In UNI's Precision Lab, the goal cell in the maze is not a reward tile, it is an observation that the agent has been told to expect with high prior probability. Set C flat, and the agent wanders coherently but without destination. Sharpen C on the goal observation, and the same generative model produces goal-seeking behavior. The lab exposes the shape of C as a dial Class C.

Why this framing matters.

Because preferences and beliefs share a probabilistic language, an active-inference agent can trade off "what I want to see" against "what I am uncertain about" inside a single expected free energy quantity. That is why the same math describes an animal foraging under partial observability, a controller stabilizing a service cell under disturbance, and a clinician reasoning about treatment sequences. The steering knob is the prior preference. The exploration knob is uncertainty about hidden states. They are two terms in one objective, not two competing systems bolted together.

In our voice, from a Themesis resource map that catalogs entry points into active inference (Where to Start with Active Inference, A Resource Map for 2026): a helpful survey of how newcomers enter the field, with SWU listed as one of several pathways. Factual, not endorsement.

UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails.

Generative Models, the organism's model of its world ›

The model that carries P(o | C) also carries the beliefs it is being steered against.

Expected Free Energy and Goal-Directed Action ›

The full objective: pragmatic term (KL to C) plus epistemic term (information gain).

Epistemic vs Pragmatic Value ›

The two drives inside every policy score, and why an agent explores instead of only exploiting.

The workshop ›

A tightly qualified engagement where these ideas meet a real delivery problem.

Foundations. Parr, T., Pezzulo, G., and Friston, K. (2022). Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. MIT Press. Preferences over observations and the pragmatic term of expected free energy are treated in chapters 2 and 7. Cited, not hosted.