UNI is one idea taken seriously: minds and organizations get by predicting what is about to happen, sensing what actually happens, and acting to keep the gap small. Here is the math, the labs, the benchmark that tries to prove us wrong, and a server any language model can call.
A plain-spoken review of active inference and the free energy principle, the idea that a system maintains itself by minimizing variational free energy, an upper bound on surprise, through both perception and action. The labs on this site are the math made steerable. Cite this as a preprint: it is not yet peer reviewed.
Zero backend, zero dependencies. Move the dials and watch inference change in front of you.
The Cell Lab is an open falsification benchmark. Five claims were written with their falsification criteria before the runs, and every loss is recorded as plainly as every win. UNI beats a random controller in 7 of 7 disturbance families (significant in 6 of 7), the rule-based SRE in 6 of 7, and a neural baseline in 5 of 7. It loses three times, shown below. A single active-inference controller is not universally best, which is exactly the disconfirmation the benchmark is designed to surface.
| Disturbance family | UNI | rule-based | random | neural | Notable |
|---|---|---|---|---|---|
| traffic_spike | 0.969 | 0.960 | 0.880 | 0.924 | UNI wins vs random (sig) |
| memory_leak | 0.740 | 0.731 | 0.676 | 0.810 | neural wins overall |
| bad_deploy | 0.937 | 0.895 | 0.621 | 0.704 | UNI wins (sig) |
| database_flaky | 0.759 | 0.803 | 0.675 | 0.694 | rule-based wins overall |
| cache_down | 0.664 | 0.641 | 0.518 | 0.588 | UNI wins (sig) |
| cpu_noisy_neighbor | 0.749 | 0.740 | 0.703 | 0.824 | neural wins; UNI vs random not significant |
| observability_loss | 0.992 | 0.988 | 0.961 | 0.974 | UNI wins (sig) |
RecoveryScore is the fraction of ticks inside the viable set, weighted by excursion depth. Committed cache: depth 2, 6 seeds, 80 ticks. "sig" means a bootstrap 95% confidence interval for the median paired difference excludes 0.
The deployment exposes a public, anonymous Model Context Protocol server. Any LLM client can introspect, simulate, and drive the labs. No auth, no token, just the URL.
16 tools, two groups. Headless, running server-side now:
list_labs, list_mazes, describe_dial, run_episode, run_sweep,
compare_labs. Live, driving a user's open lab tab: attach_session,
detach_session, set_dial, switch_maze, set_planning_depth, set_action_mode,
step_agent, auto_run, reset, read_state.
Machine-readable index for crawlers and agents: llms.txt and llms-full.txt.
Not a clinical tool, not a diagnostic instrument, not therapy or treatment advice, and not evidence that active inference is the correct theory of mental health. Behavioral labels are candidate computational phenotypes, hypotheses, not diagnoses. The preprint is not yet peer reviewed. We would rather be correct than impressive.