Cluster: generative models

Why Generative Models Are Not Neural Networks

Two objects get called the same word almost every week, and the confusion costs us real work. A generative model is a joint probability distribution over hidden causes and the observations they produce. A neural network is a family of parameterised functions that maps inputs to outputs. Composed, sometimes. Identical, never.

The generative model, precisely

In active inference the generative model is written as a joint distribution over hidden states, observations, actions, and parameters, typically P(o, s, a, theta). It is a claim about the world: which causes exist, how they evolve, what they emit. In the discrete POMDP formulation used across the UNI labs, that claim is carried by the A, B, C, and D matrices (likelihoods, transitions, preferences, priors) as laid out in Parr, Pezzulo and Friston, Active Inference, MIT Press 2022 (Class E). The generative model is the organism's model of its world, nothing more and nothing less.

Inference over that model is a separate matter. Given observations, an agent computes an approximate posterior Q(s) over hidden causes by minimising variational free energy, an upper bound on surprise. Action selection is inference too: policies are scored by expected free energy, a functional that trades off pragmatic value (do I reach preferred outcomes) against epistemic value (do I reduce uncertainty about causes). None of that mathematics requires a neural network. It requires only that the joint distribution be specified and tractable enough to update.

The neural network, precisely

A neural network is a function approximator: a stack of linear maps and pointwise non-linearities whose weights are chosen to minimise a loss. It has no privileged relationship to hidden causes. It represents whatever the loss and data teach it to represent. Given the same data, one architecture may parameterise a classifier, another a density estimator, another a policy, another an autoencoder that resembles a generative model. The network is the parameter machinery, not the statistical object.

The confusion has a source. Modern deep generative models (variational autoencoders, diffusion models, autoregressive language models) do use neural networks to parameterise pieces of a joint distribution (Class C). The network approximates the likelihood, the prior, or the posterior. That composition is legitimate and productive. But the network is the tool. The joint distribution is the claim. Swapping the tool does not swap the claim, and swapping the claim does not require swapping the tool.

Where the confusion causes trouble

Three failure modes recur. First, people read "generative model" in an active-inference paper and quietly assume a deep network is required, then reject the framework because they cannot train one on their tiny dataset. The UNI Precision Lab shows a working active-inference agent with no neural network anywhere in the loop, running in your browser, driven by hand-specified matrices (Class C). The math does not need the network.

Second, people read "neural network" in a machine-learning paper and assume the network's outputs are a posterior over hidden causes. They usually are not. A classifier's softmax is a conditional distribution over labels, not a variational posterior over latent states in a generative model. Confusing the two obscures what the model actually claims.

Third, people conflate "the brain is a neural network" (a statement about biological wetware) with "the brain implements a neural network in the machine-learning sense" (a much stronger and largely unsupported claim). The free-energy-principle literature is careful to keep these separate (Class E). The organism is claimed to carry a generative model of its niche. Whether the substrate that carries it looks anything like backprop is an open empirical question.

Where composition is useful

Nothing here says the two cannot meet. In UNI's roadmap, larger state spaces will need learned components: an amortised recognition network to approximate Q(s|o), or a learned transition kernel where the state count outruns hand specification. That is a neural network parameterising a piece of a generative model, which is exactly the sanctioned composition. The generative model still owns the semantics. The network still owns the fitting. Keeping the roles separate keeps the debugging tractable.

For field context on why the semantic distinction matters right now, Themesis has been tracking how the wider modelling landscape is shifting: Themesis, April 2026 field note on the shifting modelling landscape. Field-level context for why the GNI framing matters now, in our reading. We link, we do not paraphrase.

The shortest working test

If you cannot write down the joint distribution your system implies, you do not yet have a generative model, you have a function. If you can write it down but cannot say how you approximate the posterior, you have a claim without inference. If you have both and can then choose to represent either piece with a neural network, you have earned the composition. UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails.

Foundational references: Parr, Pezzulo and Friston (2022), Active Inference: The Free Energy Principle in Mind, Brain, and Behavior, MIT Press. Composition patterns (VAE, diffusion, autoregressive) are cited to the standard literature, not hosted here. Evidence classes used above: E (expert citation), C (code and inspection).