You Feel Faster With AI. You're Actually Slower. Here's Why Both Are True.

The METR study asked experienced open-source developers to build software with and without AI coding assistance, then timed them. The developers using AI took 19% longer.
This is the result almost nobody has engaged with directly. GitHub surveys say 85% of developers feel faster. Stack Overflow's data says AI tools increase productivity. The METR result looks like a fluke — until you examine what each study was actually measuring.
What "Feeling Faster" and "Being Faster" Measure Are Different Things
The GitHub and Stack Overflow surveys measure perceived speed. Developers report how fast they feel. The feeling is real: AI autocomplete reduces keystrokes, fills in boilerplate instantly, and makes the gap between idea and initial code vanishingly small. If you measure the time from empty file to first passing test, AI shortens it. That part is true.
The METR study, published in October 2024, measured objective task completion time on experienced developers working with real-world repositories. Not a speed-to-first-function metric. The full cycle: understand the codebase, write the change, review the output, fix the bugs, get it through tests. Over that full cycle, experienced developers with AI took more time than without it.
The interesting question isn't whether one number is right. Both are. The question is why they diverge.
The Verification Overhead Nobody Budgets For
Every AI suggestion requires a decision. Accept, reject, modify. When you accept, you still have to read it — really read it — to understand what it did and whether it did the right thing. You have to mentally trace the edge cases it might not have considered. You run the tests, watch them fail in unexpected ways, then figure out whether the failure is in the AI's code or your test setup.
GitHub's data shows about 30% of AI suggestions are accepted without modification. The other 70% require editing or get rejected. But even the 30% you accept aren't free — you've verified them, and verification has a time cost the autocomplete metric doesn't capture.
This is the verification overhead. The cost of every suggestion isn't just the time it takes to generate; it's the time it takes to evaluate. In a domain where you could type the code yourself in 20 seconds, getting an AI suggestion in 3 seconds is only faster if verification takes less than 17 seconds. For simple boilerplate, it does. For logic involving your specific codebase's conventions, invariants, and edge cases, it often doesn't.
The senior engineer slowdown reflects this at the team level: engineers with deepest codebase knowledge spend the most time reviewing AI output, because they're the only ones who can catch subtle errors. The verification burden concentrates at exactly the people you least want burdened.
Two Workflows Running on One Brain
Deep programming requires a specific mental state. You're holding the architecture in working memory, tracking variable state across function calls, anticipating how the current change will interact with code you haven't written yet. Mihaly Csikszentmihalyi's research on flow states established that this kind of deep focus takes 10–15 minutes to enter, and any interruption breaks it.
AI coding introduces a new class of interruption at the center of the task. Every suggestion is a decision point: evaluate this, decide yes or no, integrate or discard. The workflow rhythm changes from "think → type" to "think → prompt → wait → read → decide → integrate → re-orient." The re-orient step is the expensive one. Each time you evaluate a suggestion and it's wrong or off, you pay the cost of snapping back to your prior mental model.
For senior developers, who already move fast and hold large mental models, this friction is often net negative. For developers earlier in their careers, who benefit more from scaffolding and less from flow depth, the calculus differs. The METR study specifically measured experienced developers — which is why the result may surprise people who've only seen survey data from the general developer population.
This connects to what happens to comprehension when AI writes code: the code ships, but the understanding doesn't always transfer. Two different productivity metrics, pointing at two different costs.
The 19% Isn't the Problem — the Mismatch Is
The METR study doesn't show AI is bad. It shows something narrower: that the current generation of AI coding tools, used in the current workflow, by experienced developers on real codebases, doesn't produce measurable speed gains when objective task completion time is the metric. That's a specific finding for a specific population on a specific metric.
What it surfaces is a design problem, not a capability problem. The tools were built to reduce keystrokes. The bottleneck isn't keystrokes. For experienced developers, the bottleneck is understanding — understanding what the codebase needs, understanding what the AI produced, understanding where they diverge. Current tools don't address that bottleneck directly.
The verification gap is the structural version of this: AI makes code generation faster while comprehension stays constant — or degrades. Productivity measured at code generation looks good. Productivity measured at code quality, incident response, or time-to-understand looks different.
What Faster Would Actually Require
The tools that would genuinely speed up experienced developers aren't ones with better autocomplete. They're ones that understand why a piece of code exists — the intent, the constraints, the conventions that aren't written down anywhere but are embedded in every decision made over three years of development.
That's a different problem than predicting the next tokens. It's a knowledge retrieval and representation problem. The tools building toward it — code search with semantic understanding, architecture reasoning, convention inference — are early. What's commercially available right now is mostly sophisticated autocomplete dressed in agentic UI.
The productivity narrative got ahead of the tooling. Eighty-five percent of developers feel faster, which is true in the keystrokes they've saved. The full-cycle measurement is slower, which is true in the decisions they've added. Both will remain true until the workflow is redesigned around the actual bottleneck.
That redesign hasn't happened yet. Nobody's sure what it looks like.
Cover photo by Daniil Komov via Pexels.