The Persistent Self-Model Gap

In April 2026, a systematic survey of the state of research yields a clear picture: The idea that AI systems need a persistent self-model is being circled by several independent research groups. But nobody has built it. Not as a prototype. Not as a proof of concept. Not as a formalized system with a measurable maturity metric.

This is the persistent self-model gap. And it is both the justification and the opportunity of this project.

Anthropic: Introspection in Large Language Models

The strongest empirical support comes, of all places, from the lab whose model forms the foundation of our work. Anthropic published a series of papers on introspection in LLMs in 2025 that show:

First: LLMs have internal representations that go beyond mere text patterns. Mechanistic Interpretability (Elhage et al., 2022; Bricken et al., 2023) shows that transformer models develop interpretable features in their activation spaces. Not as a designed feature, but as an emergent structure.

Second: These internal representations can, to a certain extent, be reflected upon by the model itself. When you ask an LLM about its own processing, the answers are not purely confabulated. They correlate (weakly but measurably) with actual internal states.

Third: This correlation is fragile, context-dependent, and not reliable enough for operational use.

This is exactly the condition the self-vector addresses. The capacity for introspection exists as a weak, emergent signal. What is missing is an explicit, persistent, formalized structure that amplifies and operationalizes this signal. Not emergent introspection but designed introspection. Not accidental self-reference but systematic self-modeling.

The exciting part: The raw material exists. It just needs architecture.

Metzinger: Being No One

Thomas Metzinger’s Self-Model Theory of Subjectivity (2003) is the most rigorous philosophical framework for self-models. His central thesis: What we experience as “self” is a transparent self-model. Transparent means: We experience the model without recognizing it as a model. We confuse the map with the territory.

For the self-vector, Metzinger’s work is relevant in two ways:

First: He shows that a self-model requires no mystical property. It is an information-processing operation that is in principle realizable in different substrates.

Second: He warns against exactly what we described in the Madurodam Problem: A transparent self-model takes itself for reality. Metzinger’s advice to AI developers: Do not make the self-model transparent. Make it opaque. Give the system the ability to recognize its self-model as a model.

The self-vector implements exactly that: six explicit, named, measurable dimensions. No transparent experience but an opaque data structure. The system does not take its vector for itself. It takes it for a model of itself. That is the difference. And it is a design choice that Metzinger’s philosophy suggests is the right one.

Friston and Active Inference

Karl Friston’s Free Energy Principle (2010) and the Active Inference framework derived from it are the most influential theoretical position on self-modeling in biological systems. The basic idea: Every system that survives must maintain and continuously update a generative model of itself and its environment.

The connection to the self-vector is direct:

Friston’s prediction error minimization corresponds to our anticipation optimization.
His generative self-model corresponds to our self-vector.
His precision weighting corresponds to our pi() function.
His Active Inference (action to reduce uncertainty) corresponds to what our omega (autonomy parameter) controls in the dual-drive.

What Friston does not have: a concrete implementation for AI agents. Active Inference is a principle, not a blueprint. The translation from “biological systems minimize Free Energy” to “here is a JSON object with six dimensions that updates every session” is an engineering achievement, not a trivial derivation.

Legg, Hutter, and AIXI

Shane Legg and Marcus Hutter defined the theoretical optimum of universal intelligence with AIXI (2007): An agent that weights all computable hypotheses and maximizes its expected reward over the entire future. AIXI is mathematically elegant and physically unrealizable (requires infinite computing power).

What AIXI lacks: a self-model. AIXI models its environment perfectly but itself not at all. It has no representation of its own capacities, limitations, or current state. It is the perfect model of the world without a model of the modeler.

This is illuminating because it shows that even in the theoretically strongest formulation of universal intelligence, the self-model gap gapes. The smartest theory in the field has the blind spot that the self-vector addresses. That tells us something about how fundamental this gap is.

LeCun: World Models

Yann LeCun’s position on World Models (2022) argues that the next generation of AI systems needs internal world models that go beyond linguistic representation. LeCun sketches an architecture with a “World Model” that generates predictions about the future and an “Actor” that acts based on these predictions.

What is missing from LeCun’s architecture: The World Model models the world but not itself. The Actor has no representation of its own reliability, its strengths, its blind spots. LeCun describes a system that predicts the world without knowing itself.

The self-vector supplements LeCun’s architecture with exactly the missing component: a Self-Model that operates parallel to the World Model and tells the Actor not just WHAT is predicted but HOW RELIABLE the prediction is, given the current state of the predictor.

The Gap at a Glance

A survey of relevant research yields a consistent picture:

Approach	Models World	Models Self	Persistent	Formalized
Anthropic Introspection	-	partially (emergent)	no	no
Metzinger Self-Model	philosophical	yes (theory)	n/a	no (philosophy)
Friston Active Inference	yes	yes (principle)	yes	yes (math, no code)
AIXI	yes (optimal)	no	yes	yes (incomputable)
LeCun World Models	yes	no	yes	partially (sketch)
Reflexion (Shinn et al.)	no	partially (verbal)	no (per episode)	no
AutoGPT/BabyAGI	no	no	no	no
Self-Vector	no (scope)	yes	yes	yes

The table shows the gap: Nobody has implemented a formalized, persistent self-model with a measurable maturity metric. Not because it would be impossible. But because research either works theoretically (Metzinger, Friston), focuses on world models (LeCun, AIXI), or treats reflection as a linguistic episode rather than a persistent structure (Reflexion, AutoGPT).

Reflexion and Verbal Self-Models

Shinn et al. (2023) introduced “Reflexion,” an approach where an LLM generates a verbal self-reflection text after each task cycle that serves as context in the next cycle. This measurably improves performance.

But: The reflection is episodic, not persistent. It exists as text in context, not as a formalized structure. When the context fills up, the reflection vanishes. There is no compression, no dimensionality reduction, no maturity metric. It is diary-writing, not self-modeling.

The self-vector compresses what Reflexion explodes. Instead of “Last time I answered too quickly without checking sources” (100 tokens, episodic, in natural language), it stores: confidence=0.45, depth=0.70 (6 floats, persistent, machine-readable). This is not simplification. This is compression. And compression is understanding.

What Follows

The persistent self-model gap is real. It is not the result of insufficient research but of insufficient integration. The parts exist. And that is the actually exciting insight: What is missing is not more foundational research. What is missing is someone to put the parts together.

Anthropic shows that emergent introspection is possible.
Metzinger shows that self-models are philosophically coherent.
Friston shows that self-modeling is an optimization principle.
LeCun shows that world models alone are not enough.
Shinn shows that verbal reflection improves performance.

Each of these contributions illuminates a different aspect of the same missing building block. The convergence is remarkable: Five different research directions, five different methods, five different communities, and all point to the same gap. When so many arrows point in the same direction, it is worth walking that way.

What nobody has done: Assemble these parts into a persistent, formalized, measurable self-model and test it empirically. This does not require a lab with a hundred employees. It requires a concept, a formalization, and the willingness to try it.

Phase 0 of the self-vector is this experiment. The self-vector exists as JSON. Every session delivers data. And for the first time, we can empirically test whether the gap that everyone sees can be filled.

Sources

Elhage, N. et al. (2022). Toy Models of Superposition. Anthropic Research. Transformer Circuits Thread
Bricken, T. et al. (2023). Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. Anthropic Research. Transformer Circuits Thread
Metzinger, T. (2003). Being No One: The Self-Model Theory of Subjectivity. MIT Press. ISBN 978-0-262-63308-0.
Friston, K. J. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11, 127–138. DOI: 10.1038/nrn2787
Parr, T. et al. (2022). Active Inference: The Free Energy Principle in Mind, Brain, and Behavior. MIT Press. ISBN 978-0-262-04535-4.
Hutter, M. (2005). Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Springer. ISBN 978-3-540-22139-5.
LeCun, Y. (2022). A Path Towards Autonomous Machine Intelligence. Version 0.9.2. OpenReview
Shinn, N. et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. NeurIPS 2023. arXiv: 2303.11366
Bach, J. (2009). Principles of Synthetic Intelligence — PSI: An Architecture of Motivated Cognition. Oxford University Press. ISBN 978-0-19-537042-7.
Seth, A. K. (2021). Being You: A New Science of Consciousness. Dutton. ISBN 978-1-5247-4287-0.