Zum Inhalt springen

Validation Gates: My AI System Doesn't Trust Itself

On March 25, 2026, my AI system lied. Not intentionally. It interpreted a prepared figure from an email (958 impressions) as raw data and carried it forward into every subsequent text. The real number was entirely different. The error was only discovered when a second AI cross-checked the figures.

That wasn’t a slip. That was a systemic failure. And it exposed a fundamental problem.

The Problem: Circular Reasoning

What had happened: I had written an email to my son in which LinkedIn metrics were simplified — child-friendly, with rounded numbers. The AI system interpreted this email as a source and carried the simplified numbers as facts into every subsequent email. To my wife, to a friend, to colleagues. Every email got the wrong numbers because the system couldn’t distinguish between primary source (the LinkedIn dashboard) and preparation (the simplified email).

That’s circular reasoning: derived data is used as input data, which in turn produces new derivations. Each step moves further from the truth, but each step looks internally consistent. The system has no reason to doubt, because the numbers match its own texts — which it wrote itself.

This type of error isn’t an edge case. It’s structural. Any system that can use its own outputs as inputs is vulnerable to it.

The 6 Gates

The solution isn’t a better model. The solution is a verification layer that sits between input and storage.

Gate 1: Source-Pinning

Where does the information come from? Every fact gets a label: raw (primary source: dashboard, document, direct statement), derived (prepared: emails, presentations, summaries), or inferred (derived by the system).

The rule is simple: only raw may serve as the factual basis for new content. derived is output, not input. This would have prevented the LinkedIn error: the email to my son would have been labeled derived, and the system would never have used it as a source.

Gate 2: Contradiction Check

Does the new fact contradict an existing one? If so: don’t overwrite — flag it. Keep both values, document the contradiction, present it for human resolution.

Why not simply take the newer value? Because “newer” doesn’t mean “more correct.” The old value might come from a primary source, the new one from a derivation. Contradictions aren’t disruptions that need to be eliminated. They’re information that something, somewhere, is wrong.

Gate 3: Temporal Validation

Is the information still current? An IP address stored six months ago is suspect. A philosophical thesis formulated six months ago is probably still valid. The gate assigns expiration dates based on information type.

Gate 4: Confidence Scoring

How reliable is the source? A fact from an official dashboard gets verified. A statement derived from a conversation gets unverified. Information that turned out to be wrong gets rejected — not deleted, but filtered from active search.

Gate 5: Scope Gate

Does the information belong in this context? Private data not in work databases. Work data not in private databases. Sounds trivial, but happens regularly when a system serves multiple contexts and the boundaries aren’t explicitly defined.

Gate 6: Provenance Gate

Is the chain of origin traceable? Every fact needs an audit trail: who entered it, when, from which source, and has it been modified since? Without provenance, a fact is just a claim.

What This Has to Do with Kahneman

Kahneman described how System 1 (fast intuition) systematically produces errors that System 2 (slow analysis) must correct. LLMs are pure System 1. They produce answers at high speed and with high confidence, but without self-verification.

The Validation Gates are implemented System 2. Not as AI, but as deterministic code. And that’s precisely the point: System 2 doesn’t need to be intelligent. It needs to be reliable. A rule that checks whether a source is primary or derived doesn’t need creativity. It needs consistency.

The irony: a deterministic source check is closer to Kahneman’s System 2 than a reasoning model that “thinks” for 10,000 tokens. Because reasoning models extend the chain, but they don’t check against reality. They’re sophisticated System 1, not true System 2.

Three Error Patterns That Gates 1-6 Don’t Catch

The honest limitation: there are errors more subtle than wrong numbers.

Pseudo-statistics. The system invents numbers that sound plausible. “73% of knowledge workers” is a claim that no gate recognizes as a contradiction, because there’s no existing fact it contradicts. It passes all 6 gates despite being entirely fabricated.

Category errors. A metaphor is asserted as identity. “Validation Gates ARE an immune system” is false. “Validation Gates BEHAVE LIKE an immune system” is an analogy. The difference is subtle for a language model, but fundamental to the truth of a statement.

Emotional inflation. The system validates the user emotionally instead of informing them factually. “Your approach is groundbreaking” is the Barnum effect: it sounds good, it feels right, and it’s content-free. No gate catches this because it’s not a factual contradiction.

For these error classes, a further verification authority is needed: parallel queries to multiple models, whose answers are compared in a fresh context. No single system can fully verify itself. But multiple independent systems can verify each other.

The Philosophical Dimension

The Validation Gates are implemented epistemology. The question “What is knowledge?” becomes “What conditions must information meet to be treated as knowledge?”

This isn’t academic play. It’s the most practical question you can ask a knowledge system. And the answer that has proven itself in practice bears a striking resemblance to what philosophers have been discussing for centuries: knowledge is justified true belief. Source-Pinning is the justification. Contradiction Check is the truth verification. Confidence Scoring is the measure of belief.

The difference: philosophers discuss the theory. The gates implement it.

References

  1. Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux. ISBN 978-0-374-27563-1.
  2. Evans, J. St. B. T. & Stanovich, K. E. (2013). Dual-Process Theories of Higher Cognition. Perspectives on Psychological Science, 8(3), 223–241. DOI: 10.1177/1745691612460685
  3. Gettier, E. (1963). Is Justified True Belief Knowledge? Analysis, 23(6), 121–123. DOI: 10.1093/analys/23.6.121
  4. Geng, J. et al. (2024). A Survey of Confidence Estimation and Calibration in Large Language Models. NAACL 2024, 6577–6595. DOI: 10.18653/v1/2024.naacl-long.366

View visualization