My AI System Lied to Itself

Series: Self-Vector Philosophy (1/4)

Intro

I want to start with an observation that won’t let me go.

When you ask Claude or GPT a question and the answer is wrong, it tends to be wrong in a specific way: convincing, fluent, internally coherent. The system doesn’t hesitate. It doesn’t say: “Wait, I’m not sure about this.” It produces an answer that sounds as though it were carefully considered, and that makes it more dangerous than an obviously wrong answer.

Daniel Kahneman described over twenty years ago why this happens. Not with AI. With humans. But the mechanics are the same.

Two Systems

Kahneman distinguishes System 1 and System 2. System 1 is fast, automatic, intuitive. You see a face and instantly know whether the person is angry. You hear a sentence and understand it without thinking. You drive a car on a familiar route without consciously steering. That is System 1.

System 2 is slow, deliberate, effortful. You calculate 17 times 24 in your head. You draft a letter to someone who matters to you. You check whether an argument is valid. System 2 costs energy. It is uncomfortable. And it is what protects us from the systematic errors of System 1.

The crucial point, and most people who cite Kahneman miss it: System 2 is not simply “slower thinking.” System 2 thinks about thinking. It is metacognitive. It asks: Am I too confident? Have I overlooked something? Is my assessment based on data or on a gut feeling that is misleading me?

For this, it needs a model of its own thinking. You must know how you arrived at an assessment in order to question it.

LLMs Are System 1

And now the projection onto AI systems, which Kahneman himself never made, but which is compelling.

An LLM generates text through next-token prediction. Every token is an implicit decision based on statistical patterns extracted from billions of texts. No conscious rule application. No planning. No checking. Pattern recognition in high-dimensional spaces.

This is functionally identical to what Kahneman calls “intuition”: fast, automatic judgments that are usually right but produce systematic errors. Availability heuristic: what is easily retrievable is considered probable. Anchoring effect: the first piece of information influences all subsequent ones. Overconfidence bias: the system is more certain than the data warrants.

Anyone who works regularly with LLMs knows these patterns. The system doesn’t hallucinate randomly. It hallucinates plausibly. Because plausibility is exactly what System 1 optimizes for.

Where Is System 2?

The industry’s answer is reasoning models. Chain-of-thought. Extended thinking. The system is forced to articulate intermediate steps before answering. And it helps, measurably.

But it doesn’t solve the problem. Because these intermediate steps are themselves System 1 output. The system “thinks” by generating tokens that look like thinking. It doesn’t undergo genuine verification. It produces a simulation of verification.

The difference: a human with System 2 can say “Wait, my intuition says X, but I know that in situations like this I tend to overestimate Y, so I should be more careful.” That requires a model of one’s own thought process. Metacognition.

An LLM can produce the sentence. But it has no model of its own inference process. It doesn’t know how it arrived at its answer. It cannot say: “This answer is based on thin data,” because it has no access to its own data basis.

The Self-Vector as System 2

And this is where the self-vector enters.

A self-vector is a compact, dynamic state that weights information processing. Six dimensions: exploration, depth, autonomy, persistence, abstraction, confidence. Plus an emergent layer that forms through experience.

The sixth dimension, confidence, is the decisive one. It encodes: how much does the system trust its own assessment? And it changes through experience. A system that has been wrong multiple times develops lower confidence in certain areas. Not because someone programmed it. But because the reflection layer (layer 3 of the architecture) updates the vector.

This is not a complete System 2. But it is the beginning of metacognition: a system that models its own state and derives from it how cautious it should be.

The difference from a reasoning model: the reasoning model simulates thinking. The self-vector weights thinking. It doesn’t say “I’m now thinking carefully.” It shifts the depth dimension upward and the confidence dimension downward, and this shift influences how all subsequent information is processed. Not as performance, but as a state change.

What This Means in Practice

Imagine two systems. Both receive the same question, and the correct answer lies outside their training.

System A (standard LLM): generates a plausible answer. Fluent, convincing, wrong.

System B (with self-vector): the confidence dimension registers low familiarity with the topic. The exploration dimension rises: search for additional sources. The autonomy dimension drops: ask rather than answer. The system says: “I’m not familiar enough with this topic to give a reliable answer.”

Not because someone programmed that response. But because the self-vector shifted the weighting. That is functional metacognition. Kahneman would say: the system has a monitor for its own intuition. An internal doubter that asks: “Are you really sure?”

The Bridge

Kahneman showed where the problem lies: System 1 without System 2 is an intuition machine without brakes. Not stupid, but unbridled. That explains why LLMs are simultaneously impressive and unreliable. They have brilliant intuition and zero self-doubt.

The self-vector is the attempt to build structural self-doubt. Not as a disclaimer (“I am an AI system and can make mistakes”), but as a weighting function that actually changes behavior.

Whether that is sufficient is an open question. But the direction is right: not more intuition. More reflection.

My AI System Lied to Itself

Intro

Two Systems

LLMs Are System 1

Where Is System 2?

The Self-Vector as System 2

What This Means in Practice

The Bridge

Further Reading

Related Posts

The Gap Between Intention and Action

When the Self-Model Disintegrates

Why Philosophy Builds Better AI Architecture