Your Team Uses AI for Everything. Your Estimates Are Still Wrong.

Cover Image for Your Team Uses AI for Everything. Your Estimates Are Still Wrong.

The project was supposed to take six weeks. The team used AI for everything — code generation, documentation, test writing. Four weeks in, the scope had grown by 40%. The estimate wasn't revised until week seven. The project shipped in week twelve.

This is not an unusual story. It's not even a story most engineering teams would find surprising. What's surprising is the conclusion they draw from it: that they need better estimation techniques. More planning poker. Tighter sprint ceremonies. Better story point calibration.

None of that is the problem.

The Number That Should Have Ended This Debate

The Standish Group has been tracking IT project outcomes for over twenty years. Their CHAOS Report data is contested in its methodology but consistent in its direction: roughly 66% of enterprise software projects experience some form of cost or timeline overrun. McKinsey's research on large IT programs found that 17% threaten company existence and 20% fail to deliver promised benefits even when they ship on time.

These numbers predate AI coding tools by decades. They existed when developers typed every line manually. They existed in the era of waterfall and in the early years of Agile. They exist today with Copilot and Claude and Cursor running in every IDE.

If slow typing were the cause of estimation failure, faster typing would have improved those numbers by now. It hasn't. The estimate problem is structural, not mechanical. And adding AI to a structural problem without changing the structure doesn't help. It amplifies.

Why Estimates Fail (It Was Never About Typing Speed)

Software estimation is hard for one specific reason: scope is uncertain, and teams price it as if it isn't.

A project estimate is a prediction about the total amount of work required to deliver a defined outcome. The problem is that the outcome is rarely actually defined, and the work required to deliver any given set of features depends on things you don't discover until you're building them — integration friction, edge cases, performance problems, requirements that turn out to be ambiguous when you try to implement them.

These aren't failures of planning rigor. They're inherent properties of complex systems. The academic research on this is unambiguous: studies at Oxford and published in journals like the Journal of Information Technology have documented a power-law distribution in project cost overruns — not a normal distribution. Most projects run slightly over. A significant minority run catastrophically over. The catastrophic overruns are driven by unknown unknowns, not by miscounting story points.

Estimation techniques address known uncertainty. They have no mechanism for unknown unknowns. The only thing that addresses unknown unknowns is scope discipline — the practice of saying no to additions mid-project, of defining delivered scope tightly enough that you know when you're done, of treating scope expansion as a new project requiring a new estimate.

Most teams do not do this. They estimate once, start building, discover complexity, add features because they're now easy to add (more on that shortly), and then explain to stakeholders why the timeline has moved — again.

What AI Actually Does to This Problem

Faros AI's 2025 research on developer productivity with AI tools produced a finding that deserves to be quoted directly: developers reported perceiving a 24% speed gain, while measured delivery time showed a 19% slowdown. Pull request review time increased 91%. The number of merged PRs increased 98%.

More output. Slower delivery. The paradox resolves cleanly once you understand it: AI makes individual coding tasks cheaper and faster, which means more tasks get created and completed, which means more code enters review and integration pipelines that weren't scaled to handle the volume. Speeding up one stage of a multi-stage process doesn't speed up the process — it creates a bottleneck at the next stage.

But the more important effect is on scope. When individual features become cheap and fast to build, the question "should we build this?" gets displaced by "we might as well, since it's easy." This is how scope expands in AI-assisted development: not through deliberate decisions to expand scope, but through a hundred small decisions where the cost of adding something feels negligible. It isn't negligible — it's cumulative. Every additional feature adds QA surface, documentation requirements, integration complexity, and maintenance burden that the estimate never captured.

The Gradle research team described this as workload creep: when individual tasks get cheaper, total work volume expands because more work gets requested and accepted. The factory-line analogy is useful — speed up one machine in an assembly line while leaving the others unchanged and you don't get a faster factory. You get a pile-up at the next station.

The Vibe Prototype Trap

There's a specific AI-era pathology that engineering teams are just beginning to name: the vibe prototype trap.

AI tools make it trivially easy to generate a working-looking prototype — a UI that functions, routes that respond, interactions that animate. Non-technical stakeholders see this prototype and make a category error: they conclude that the product is mostly done. The remaining work is finishing and polishing. Estimation conversations happen in the shadow of that category error.

What's actually true is that a vibe prototype built in a weekend with AI assistance has addressed approximately 10% of the actual engineering problem. The prototype has no error handling. Its architecture won't survive the actual data model. The database interaction is mocked. The authentication is imaginary. None of the edge cases exist. The accessibility is zero.

But the demo worked, and the demo looked real, so the estimate that gets set — and more importantly, the expectation that gets established with stakeholders — is based on a completed work that is nowhere near complete. The team then spends the next two months building the real version while appearing to be "just finishing things up."

This is not a new pattern. It predates AI. But AI has made generating the misleading prototype so fast and so compelling that the gap between "demo ready" and "production ready" has widened considerably.

What Actually Helps

None of this means AI tools aren't valuable. They are. The error is treating them as a solution to estimation without addressing estimation's actual problem.

A few things do help:

Scope lockdown after kickoff. Once an estimate is set, scope additions require a new estimate for the addition. Not "we'll figure it out" — a literal, separate estimate. This creates friction that makes scope expansion a decision rather than an accident.

Separating prototype timelines from build timelines. The time it takes to demonstrate a feature is not related to the time it takes to ship a feature. These should be tracked separately and not allowed to inform each other in stakeholder communication.

Estimation at the integration layer, not the feature layer. Most estimation happens at the feature level: "how long will the login page take?" But projects fail at integration — when the login page's auth scheme conflicts with the session management, which conflicts with the third-party identity provider. Estimating integration work explicitly, before it's a crisis, is where estimation actually becomes useful.

Smaller commitments, more often. The research on this is consistent: teams that commit to smaller batches of work and deliver frequently have more accurate estimates than teams that commit to large releases with long cycles. Not because smaller commits are inherently easier to estimate, but because they surface unknown unknowns faster, when the cost of addressing them is still low.

The AI tools don't change any of this. They make code faster to produce, which is genuinely valuable. But estimation was broken before, and it's still broken now — not because teams lack tools, but because the constraint was never typing speed. It was always scope, and no one has figured out how to make scope discipline fast.

Photo by Walls.io.


Related: You Feel Faster With AI. You're Actually Slower. Here's Why Both Are True. · You Don't Understand Your Own Codebase Anymore