Our Elixir Workbench: Why That Stack, Universal Natural Intelligence

Almost every serious active-inference toolkit in public circulation is written in Python. Ours is written in Elixir. This post explains why, honestly, and where that choice hurts.

UNI is a working hypothesis on an attainable path toward General Natural Intelligence: a natural, active-inference approach whose evidence is growing, evidence-classed, and tested in the open. Do not take the claim on faith. Test the build, inspect the gates, and help us find where it fails.

What we are actually building

An active-inference agent is a discrete-time POMDP loop. On every tick, we update a posterior over hidden states, evaluate expected free energy across candidate policies, and sample an action. In production, we run many such loops at once: multiple labs, multiple sessions, an MCP server driven by outside language models, and a benchmark harness that replays disturbances against a hidden 216-state cell.

That shape (many long-lived, concurrent, message-passing processes, each holding a small generative model and a Markov blanket around its own state) is exactly the shape the BEAM was designed for Class E. The Erlang/OTP tradition of supervised processes and let-it-crash isolation was written for telecoms in the 1980s (Armstrong, Making reliable distributed systems in the presence of software errors, 2003) and it maps cleanly onto the loop structure of active inference.

What Elixir makes easy

One process per agent, one supervisor per lab. Each POMDP loop is a GenServer with its own state. A crashed inference tick takes down that one process, not the fleet. The supervisor restarts it with a clean prior. This is a natural fit for the Markov-blanket framing where each agent owns its own conditional independence boundary Class C.
Concurrency without shared memory. KL divergence updates happen inside a single process. Cross-agent communication is explicit message passing. There is no lock discipline to get wrong.
Hot code loading. We can revise the belief-update code on a running node without restarting the labs the public is currently steering. For a science site that people call over MCP in real time, that is not a small win Class C.
Phoenix LiveView for the labs. The Precision, Echo, Loop, Heart, and Cell labs are all rendered server-side and updated over a persistent socket. The agent state you see is the same state the MCP tools read. One source of truth, no drift between the UI and the model.
Observability by default. :telemetry, :observer, and per-process message queues make it straightforward to watch precision, free-energy estimates, and policy entropy tick-by-tick without a separate metrics stack.

What Elixir makes hard

Nothing is free. Being honest about the costs is part of the discipline.

Numerics. The BEAM is not a numerical VM. For dense linear algebra we lean on Nx, EXLA, and (where the math warrants) NIFs into C. That works, and Nx has matured a lot, but the Python side of the world still ships more autodiff, more solvers, and more community-tested probabilistic programming.
Notebook culture. Livebook is genuinely good, and it is not Jupyter. If a colleague hands you a folder of active-inference notebooks, they are almost certainly in Python. That is a real integration cost we have to pay at every collaboration boundary.
Talent pool. Fewer people write Elixir than write Python. Onboarding takes longer. We accept that trade because the runtime characteristics matter more to us than the hiring pipeline.
Library gravity. Parr, Pezzulo and Friston (2022) is the primary reference for the math Class E. The reference implementations that surround the book are Python. We reimplement, cross-check numerically, and cite openly rather than pretending our workbench is the canonical one.

How this fits alongside the Python community

The active-inference field has a public, teach-in-Python center of gravity, and that is a good thing. Themesis runs a hands-on course, Building Active Inference in Python, that walks through the same math from the Python side Class E. That is a complementary track to what we do here: a different stack, aimed at hands-on Python fluency, and worth its own recommendation on its own terms.

Our position is narrow: for a supervised, always-on workbench that many agents can call over MCP while humans steer sliders in a browser, we picked the runtime the shape of the workload actually rewards. For an introductory course in the math, or for a lab notebook that has to interoperate with the rest of scientific Python, we point people at Python without hedging.

The workbench is transparent, by design

Every dial you can move in a lab corresponds to a variable in the generative model. Every action the agent selects is logged with the free-energy estimate that produced it. The Cell Lab benchmark ships with a committed cache and pre-registered falsification criteria, and it records losses as plainly as wins. The workbench is the argument.

Encoding a generative model in Elixir ›

How the A, B, C, D matrices land as process state, and why that shape matches OTP.

Action selection in the UNI workbench ›

Expected free energy, policy sampling, and where the numerics live.

Transparency page ›

What we publish, what we hold back, and why. The evidence classes, in one place.

The Science ›

Preprint, labs, benchmark, and the public MCP server.