The Chain of Thought Is Not the Computation

Ask a language model to show its work and it will. Pages of it. Step-by-step. Careful and systematic and, if the prompt is well-constructed, quite convincing. The problem is that none of it is the work. It's a description of work that already happened — in weights, not words.
Chain-of-thought prompting became one of AI's signature productivity tricks because it reliably improves benchmark performance. Get a model to reason out loud before answering and it makes fewer errors. What we missed — what the field mostly sideStepped for a couple of years — is that this improvement doesn't mean the model is actually thinking through the steps. It means that generating reasoning-shaped tokens tends to steer the final output toward better regions of the model's probability space.
The reasoning tokens are bait. The model follows them. But the reasoning tokens aren't where the answer comes from.
What Interpretability Research Actually Shows
Anthropic's mechanistic interpretability work — particularly the superposition hypothesis and the scaling work on sparse autoencoders — gives us a window into what models are actually computing during a forward pass. What it shows is not a deliberative agent reasoning through premises to a conclusion. It's feature activation and suppression across attention heads, happening in a few dozen milliseconds across hundreds of layers.
The actual computation that produces an answer is happening inside the forward pass — in the numerical operations on activations, not in any token sequence. By the time the model starts generating the reasoning chain, the answer has, in a meaningful sense, already been determined. The CoT generation is conditioned on that implicit answer, not the other way around.
This was made concrete by research from Turpin, Michael, and others (published in 2024) studying what they called "unfaithful chain-of-thought" prompting. They showed that adding a biasing feature to a prompt — something like a statement that the correct answer was "A" — caused models to generate reasoning that supported A even when A was wrong. The model's CoT followed the answer. The CoT didn't lead to it.
That's the tell. If the reasoning were actually causally upstream of the answer, you couldn't bias the reasoning by biasing the answer. But you can.
Why This Matters For Anything You're Building
If you've built a system that relies on LLM reasoning for auditing, compliance, or explainability, this should make you uncomfortable.
Enterprise AI compliance frameworks increasingly assume that chain-of-thought output constitutes an "explanation" of model behavior. This is how vendors are selling it. "Our model shows its work" means "our model will produce a plausible-sounding account of why it made the decision it made." That account has no guaranteed relationship to the actual computation.
The distinction matters in a few specific ways:
Debugging. If your model produces a wrong answer and you read the CoT, you may conclude it "reasoned incorrectly." But the incorrect reasoning may have been generated to fit the incorrect answer that was already produced. Fixing the reasoning prompt may not fix the underlying issue at all.
Audit trails. In regulated industries, AI audit requirements increasingly ask for decision rationale. If that rationale is a post-hoc confabulation, you haven't satisfied the spirit of the requirement. You've satisfied the form while building exactly the kind of black box that made regulators nervous to begin with.
Trust calibration. Users who see well-reasoned explanations form higher confidence in the system. That confidence is misplaced if the explanations are decorative. This is how AI systems train users to trust outputs they shouldn't.
What CoT Is Actually Useful For
None of this means chain-of-thought is useless. It is genuinely useful. The problem is understanding what it's doing.
CoT improves performance because generating plausible reasoning conditions the model toward more careful, accurate outputs. It acts as a kind of soft constraint on the output distribution. That's real value. A model told to reason step-by-step makes fewer errors than one forced to produce a final answer in one shot.
It's also useful for error pattern classification. If a model consistently produces a certain type of wrong reasoning chain alongside certain wrong answers, that's a signal about the training data or the task framing, even if the reasoning isn't causally upstream.
What you can't do is treat the CoT output as a faithful explanation of what the model is doing. You can treat it as a symptom. You can use it for calibration. You cannot use it as evidence of how the model arrived at a conclusion, because it didn't.
The Problem With "Reasoning" Models
The 2025-2026 release of dedicated "reasoning models" — products that extend CoT into extended thinking chains with hundreds or thousands of tokens before final output — adds a wrinkle. Proponents argue that these extended chains are qualitatively different: the model is actually working through the problem, not just generating plausible narration.
The honest answer is: it's complicated and the evidence cuts both ways. Extended thinking chains do appear to improve performance on certain types of tasks (math, formal logic) more than standard CoT does. This is consistent with longer chains providing tighter distribution steering. Whether that constitutes "real reasoning" or better-calibrated confabulation is a question interpretability research hasn't settled.
What we do know is that the architectural claim — that a language model is "thinking" during its token generation — is metaphorical, not mechanistic. Transformers don't iterate. They do a single forward pass. Whatever "thinking" the extended chain represents, it's happening during a generative process that isn't structurally different from producing any other sequence.
What Engineers Should Do With This
The practical implication is not "don't use chain-of-thought." The implication is "don't conflate the reasoning tokens with the reasoning."
When building systems that will be audited, be honest with auditors about what LLM-generated rationale is. It's a human-readable account that may correlate with model behavior. It isn't a faithful trace of computation.
When debugging model failures, look for patterns in the wrong outputs themselves — not just the stated reasoning. Treat CoT as a probe for the model's surface behavior, not a window into its computation.
When designing user-facing AI explanations, weight the human value of understandable narration against the liability of users trusting those explanations more than they should. A model that says "I did X because of reason Y" may be entirely wrong about reason Y.
The field has made a lot of progress in what AI can do. It has made much less progress in understanding how it does it. Chain-of-thought gave us the useful illusion of a window into the machine. Treating it as an actual window is the error.
The computation is not in the tokens. The tokens are about the computation. It's a subtle difference with large consequences — especially now, when the systems are starting to do things that will need genuine accountability.
Cover: 3D rendering by Google DeepMind via Pexels