Explanation

Explanation is the translation layer of augmented intelligence. It transforms complex representations — whether from a neural network, a dataset, or your own reasoning — into communicable form. Without it, AI output is opaque, and opaque output cannot be trusted.

Advanced · 9 min read

The Explanation Problem

Ask a large language model to explain why it gave a particular answer. It will produce a fluent, confident paragraph. It will sound like an explanation. It will have the grammatical structure of an explanation. But it is not an explanation — it is more generated text.

This is the core problem. AI systems can produce outputs that resemble reasoning, but the process that generated those outputs is fundamentally different from human reasoning. A language model does not "decide" to give an answer and then explain its decision. It generates the answer and the explanation through the same statistical process — predicting the next token in a sequence.

When someone asks you why you chose a particular restaurant, you can trace your actual reasoning: you wanted Thai food, this place had good reviews, it was within walking distance. Each element of the explanation corresponds to an actual factor in your decision. AI explanations do not have this correspondence. The "explanation" is a post-hoc construction that may or may not reflect the computational process that produced the output.

This is not a minor technical issue. It is a fundamental challenge that shapes how you should interact with every AI system you use.

Genuine Explanation vs Generated Text

David Deutsch's concept of hard-to-vary explanations gives us a precise tool for distinguishing genuine explanation from noise. A genuine explanation has specific properties:

  • Every component does essential work. You cannot swap parts of the explanation without breaking it. If you can replace "the model hallucinated because the training data was sparse in this domain" with "the model hallucinated because Mercury was in retrograde" and the explanation feels equally plausible, then neither version is actually explaining anything.
  • It is traceable. You can follow the chain of reasoning back to evidence. A genuine explanation connects its conclusion to observable facts through a chain of logic that you can inspect and test.
  • It is falsifiable. A real explanation makes predictions. If the explanation is correct, certain things should follow. If those things do not follow, the explanation is wrong. Generated text rarely makes falsifiable claims — it hedges, qualifies, and keeps its options open.

The test: When AI gives you an explanation, ask — can I remove or change any part of this and still reach the same conclusion? If yes, the explanation is easy to vary, and you should not trust it. If removing any element breaks the reasoning chain, you may have a genuine explanation worth building on.

Interpretability and the Black Box

Modern AI systems are often described as black boxes — you put data in, you get results out, but you cannot see what happens in between. This is not a metaphor. A large neural network contains billions of parameters whose interactions are genuinely opaque, even to the researchers who built it.

Interpretability research attempts to open the black box. Techniques like attention visualisation, feature attribution, and mechanistic interpretability try to answer: what patterns is the model using? Which parts of the input influenced the output? What internal representations did the model form?

This work is valuable but limited. Current interpretability methods can tell you which input tokens the model weighted most heavily, but they cannot tell you why those tokens mattered in a way that constitutes a genuine explanation. The gap between "what the model did" and "why the model did it" remains large.

For practitioners, this means you cannot rely on the AI to explain itself. You must build explanation from the outside — by testing the output against known facts, by comparing results across different prompts, by verifying claims independently. The explanation comes from your process, not from the model.

Knowledge Distillation

Knowledge distillation is the process of compressing complex knowledge into simpler, more communicable forms. In machine learning, it literally means training a smaller model to replicate the behaviour of a larger one. But the concept applies far more broadly.

Every time you take a 50-page research paper and extract three key findings, you are performing knowledge distillation. Every time you create a diagram that captures the essential logic of a complex system, you are distilling knowledge. The value is not just compression — it is the creation of understanding.

AI can assist with distillation, but it cannot do it for you. A language model can summarise a paper, but it cannot determine which findings are most important to your specific context. It can generate a diagram description, but it cannot judge whether the diagram captures the essential relationships or misses the crucial one. The distillation requires your understanding of what matters — the AI provides speed, you provide judgement.

Effective distillation produces what this framework calls cognitive artifacts — encapsulated structures of meaning that can be shared, reused, and built upon. The quality of your distillation determines the quality of these artifacts, which in turn determines how effectively you and others can reason about the underlying knowledge.

Making AI Output Verifiable

If you cannot explain how an AI reached its output, the next best thing is to make the output verifiable — to structure your workflow so that AI claims can be checked against reality before you act on them.

Practical strategies for verifiable AI output:

  1. Demand sources. When AI makes factual claims, ask for specific sources. Then check those sources. AI frequently fabricates citations — the fact that it provides a reference does not mean the reference exists.
  2. Decompose claims. Break complex AI outputs into individual claims and verify each one separately. A paragraph that is 90% correct is still dangerous if the 10% that is wrong is the part you act on.
  3. Use structured output. Ask AI to provide output in structured formats — tables, schemas, step-by-step reasoning — that make individual claims visible and checkable. Freeform prose hides assumptions; structure exposes them.
  4. Cross-reference. Run the same query through different models or different prompting strategies. Where the outputs agree, you have higher confidence. Where they disagree, you have identified an area that requires human investigation.
  5. Test predictions. If the AI's output implies certain things should be true, test those implications. A recommendation that "users will prefer option A" can be tested with actual users. An analysis that "this code will fail under load" can be tested with a load test.

The principle: You do not need to understand how the AI works internally. You need to build a process around the AI that catches errors before they reach production. Explanation is not just about understanding the model — it is about making the entire workflow transparent and correctable.

Explanation as a Practice

In the augmented intelligence framework, explanation is not a feature you wait for AI companies to build. It is a practice you develop. Every time you take an AI output and translate it into a form that someone else (or your future self) can understand and verify, you are practising explanation.

This connects directly to meta-cognition — the feedback loop that governs your AI interactions. Meta-cognition asks "am I thinking about this correctly?" Explanation asks "can I communicate what I have found in a way that is traceable and verifiable?" They are complementary disciplines: one governs your internal process, the other governs the output.

Explainer agents — AI systems designed specifically to make other AI systems' outputs understandable — are an emerging capability. But they face the same fundamental challenge: their explanations are also generated text. The human in the loop remains essential, not as a bottleneck but as the only agent in the system capable of genuine understanding.

Continue Learning

Explanation is the translation layer. Next, explore how knowledge persists and grows through memory systems.