Vibe Coding Breaks When Someone Has to Maintain It

Cover Image for Vibe Coding Breaks When Someone Has to Maintain It

The PR merged on a Friday afternoon. Four hundred lines, written in forty minutes. The reviewer approved it because it looked fine. The tests passed. The feature shipped.

Six months later, nobody can explain what the useTransactionReconciler hook actually does or why it was written that way. The original developer has left. The AI that generated the code has no memory of the session. And the hook is now doing something slightly wrong in an edge case that nobody caught — because nobody understood it well enough to write a test for the right thing.

This is the new technical debt. Not the kind you accumulate from moving fast and cutting corners. The kind you accumulate from shipping understanding without building it.

What Vibe Coding Actually Is (And Isn't)

"Vibe coding" — the term Andrej Karpathy popularized in early 2025 — describes programming by intent rather than by line. You describe what you want, the AI generates it, you verify it works, you move on. You don't need to understand every line to ship a working feature. The AI is the implementer; you're the specifier.

The productivity case is real. A 2025 study from MIT CSAIL found developers using AI coding tools completed tasks 55% faster on average. GitHub's 2026 developer survey put Copilot usage at 73% of professional developers, with the majority citing it as their primary coding accelerator. That is not noise.

What nobody is measuring cleanly is the maintenance side. What happens to code written by someone who didn't have to understand it?

The Intent Gap

Traditional technical debt is well-characterized: you made a decision you knew was suboptimal, deferred a refactor, skipped tests under deadline pressure. The debt is documented, at least implicitly. You know where the bodies are buried because you buried them.

Orphaned-intent debt is different. The code is often clean — AI tools write reasonably idiomatic code, follow patterns, handle null cases. The debt is cognitive: the link between business requirement and implementation decision was never formed. Nobody chose this architecture. The AI proposed it, it worked, it shipped.

When you need to change it later — add a new field, handle a new state, fix a race condition — you are not refactoring something you understand. You are reverse-engineering something nobody understood in the first place.

The AI cannot help you here, either. Ask it why the useTransactionReconciler hook uses useLayoutEffect instead of useEffect, and it will give you a plausible answer. That answer may have nothing to do with the actual reason — the actual reason being that when the code was generated six months ago, useLayoutEffect produced output that passed the test case it was given. That's not an architectural decision. That's an accident that survived review.

Plausible explanations for code that was never intentionally designed are not documentation. They are confabulation.

Where Orphaned-Intent Debt Shows Up First

The first place it compounds is not in bug counts. It's in onboarding.

A senior engineer at a B2B SaaS company described it to me: "We used to be able to sit a new hire down with the codebase and explain the decisions. Why did we pick this state management approach? Because we had this constraint. Why does this API work this way? Because of this business rule. Now we have whole sections where the answer is 'the AI wrote it that way and it worked.' That's not knowledge transfer. That's an archaeological dig."

The second place it surfaces is in testing coverage on failure paths. Tests written by someone vibe-coding a feature tend to cover the happy path thoroughly — because that's what you verify to confirm the feature works. They miss the edge cases that would require understanding why the code does what it does.

A 2026 paper from the University of Melbourne analyzing 10,000 GitHub repositories found AI-assisted pull requests had 23% lower test coverage on error paths compared to human-written pull requests of similar size. The feature worked. The question of what "working" meant in failure states had never been fully answered.

Four Patterns That Actually Help

Telling teams to "understand everything they ship" is both correct and impractical. The productivity gains are too real to forfeit. The question is how to preserve the speed while preserving the understanding.

Require intent documentation before generation. Before you prompt, write a comment explaining the business requirement, the constraints, and the failure modes you care about. This forces thinking before delegation and embeds reasoning in the codebase even if the original developer never retains it. Future maintainers can at least reconstruct why.

Review for why, not just what. Code review on AI-generated code needs to ask a different question: not "does this work?" but "can I explain why this approach was chosen over the alternatives?" If you cannot answer it in the review, you have not reviewed it — you have stamped it.

Write failure tests before implementation. Define what "wrong" looks like before asking the AI to generate something. This inverts the dependency: your understanding of failure modes drives the generation, rather than the generation defining what failure even looks like.

Treat AI-generated modules as third-party dependencies. You do not necessarily understand every library you use — but you document what it does, you pin versions, you write integration tests against the interface. AI-generated code that nobody understands should get the same treatment: interface contract documented, behavior tested at the boundary, internals treated as a black box.

The Real Measurement Problem

The vibe coding debate tends to focus on whether the code is correct. That's the wrong frame. Correct code that nobody understands is a liability that has not yet matured.

The productivity gains from AI-assisted development are table stakes at this point. The teams that sustain those gains are the ones that figure out how to preserve understanding alongside speed — not as a performance concern, but as a maintenance and knowledge transfer concern.

That PR merged fine on Friday. The question worth asking is what it looks like in eighteen months when the original developer is gone, the requirements have shifted three times, and nobody can explain why it was written this way.

That's the cost you don't see in the sprint velocity metric.


See also: JavaScript Bloat Is an Architectural Problem, Not a Tactical One — on the similar pattern of decisions that feel fast locally and compound globally.

Photo by Daniil Komov via Pexels.