Generative models, cluster note 04

Factorization, Time, and Hierarchy in Generative Models

The joint distribution over everything an organism could ever observe and everything it could ever hide from view is astronomically large. You cannot store it, sample it, or minimize free energy against it. What you can do, and what active inference actually does, is factorize it.

A generative model in the sense used by Parr, Pezzulo and Friston (2022) is a joint density over hidden states, observations, and, in the temporally extended case, policies (Class E). Written raw, that joint is intractable in any realistic setting. The engineering (and, on our reading, the biology) is a set of independence assumptions that break the joint into a product of small factors. Two families of factorization matter most: factorization across time steps, and factorization across hierarchical levels.

Factorization across time: the discrete-time Markov assumption

In the discrete-time POMDPs the labs use, we assume that the hidden state at time t depends only on the hidden state at time t-1 and the action just taken, and that the observation at time t depends only on the hidden state at time t (Class E, after Parr, Pezzulo and Friston 2022). This is the Markov blanket over time. It is not a claim that the world is truly memoryless. It is a working assumption that lets the joint over a T-step trajectory decompose into a product of T local factors.

That decomposition is what makes the free-energy objective tractable. Instead of one giant sum over trajectories, you get a sum of local terms, each of which touches only a small neighborhood of variables. The variational posterior can then be factorized to match, one factor per time step, which is the mean-field assumption most active-inference implementations use as their default starting point (Class C, visible in every lab that ships on this site).

The Precision Lab is exactly this structure made steerable. Three dials (sensory precision, transition precision, policy temperature) sit on top of a factorized POMDP, and moving them changes how sharp each local factor is. You can watch the agent's behavior shift as the factorization stays fixed but the confidence in each factor rises or falls.

Factorization across hierarchy: slow variables at the top, fast variables at the bottom

The second factorization is over levels. A hierarchical generative model assumes that slow, abstract variables at higher levels generate the parameters of faster, concrete variables at lower levels, and that the levels are conditionally independent given their immediate neighbors (Class E). Context sets policy, policy sets expectation, expectation sets the sensory factor. Nothing at level k needs to know the full state of level k+2 if it has level k+1 as its Markov boundary.

This is why the same math scales from a maze agent to a cardio-renal loop without becoming a different theory. The Heart Lab treats long-term homeostasis as a slow generative process whose expectations shape the faster loops beneath it. The Loop Lab collapses to two states precisely because at that altitude, one slow context variable is all that matters. The factorization pattern is the same, and only the depth changes (Class C).

Why factorization is the load-bearing move for tractable inference

Variational inference works by picking a family of approximate posteriors, then minimizing the KL divergence from that approximate posterior to the true one. If the true posterior is unfactorized, no factorized approximation can ever reach it, and the residual free energy is real, not a bug (Class E). The engineering trade is deliberate: you accept a small, bounded gap in exchange for a computation you can actually run.

Two consequences follow. First, precision-weighting matters more than it looks. Because the factors are conditionally independent under the model but not under reality, the only knob you have for down-weighting a factor whose predictions keep failing is its precision. The Loop Lab bifurcation diagram is a visible receipt of this: sensory precision is the upstream variable, not because we said so, but because the factorization made it so (Class A, in-lab).

Second, hierarchy gives you a natural place to put priors that would otherwise contaminate the fast loop. A slow prior about "what kind of situation this is" can live one level up and modulate the lower level through expected-state parameters, instead of being smuggled into the likelihood as a fudge. This is the shape of the argument in the preprint, and it is what the labs are meant to make tangible rather than mystical.

What this does not prove

A tractable factorization is not a claim about how any particular brain factorizes its world (Class F falsifier: if a task requires a non-Markov structure the labs cannot represent, the labs will lose to a baseline that can, and we will show it, as the Cell Lab does in 3 of 7 disturbance families). The factorization is a modeling choice with real costs, taken because the alternative is not computation at all. UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Move the dials in the Precision Lab, open the Cell Lab benchmark, and help us find where the factorization breaks.

Evidence tags: Class E expert citation of Parr, Pezzulo and Friston (2022), Active Inference, MIT Press. Class C code-level factorization visible in every lab on this site and in the preprint at DOI 10.5281/zenodo.19785799.