Vibe Coding Is a Prototype Strategy, Not a Deployment Strategy

Cover Image for Vibe Coding Is a Prototype Strategy, Not a Deployment Strategy

The README said "vibe coded." The production incident report said 13 hours offline.

In February 2025, Andrej Karpathy posted an X thread describing his workflow: Cursor Composer, Claude Sonnet, SuperWhisper, zero diff review, full trust in the model's output. He called it "vibe coding" — building something by feel rather than by architecture. The tech world took this as permission. What nobody noticed was that Karpathy was talking about personal projects. When companies applied the same approach to production deployments, the failure mode wasn't embarrassing. It was expensive.

What Karpathy Actually Said

The original post (February 2, 2025, x.com/karpathy/status/1886192184808149383) was specific: "I forget that the code exists." Cursor handles the rest. No architectural oversight. Full trust in the diff. The workflow made sense for what he was building — small, solo, throwaway. He wasn't describing how to ship software to 120,000 customers.

The problem is that "vibe coding" as a term traveled faster than its context. By mid-2025, startup Twitter had absorbed it as a philosophy rather than a workflow description. Teams started shipping Cursor-generated code without review. Architecture became optional. The diff became an obstacle, not a guard.

The distinction matters: Karpathy's vibe coding is a solo workflow for a system one person fully understands. The moment a second engineer, a second service, or a second codebase enters the picture, the context the model is coding for changes. The model doesn't know what it doesn't know about your system.

The Production Number Nobody Quotes Correctly

Lightrun's 2026 State of AI-Powered Engineering Report surveyed engineering teams across 300+ companies. The finding: 43% of AI-generated code changes require manual debugging in production after passing QA and staging. That's not a warning about code quality in isolation. That's a production failure rate — the systems designed to catch the problem didn't.

The reason is structural. AI coding tools generate code that's coherent within the slice they can see: the local function, the current file, the visible context window. They can't see the service dependency your authentication layer takes for granted. They don't know the undocumented configuration assumption three engineers have honored for two years. They write correct code for a system they've never actually encountered.

QA passes because the tests check what the code does, not what the system expected it to do. The deployment fails because the tests didn't model the constraint that mattered.

This is why the 43% number is higher than most teams expect. The issue isn't that AI writes obviously wrong code. It writes code that passes every check — and then fails in the specific context your production environment represents that the tests couldn't anticipate.

The Amazon Case

In December 2025, Amazon's internal AI coding assistant, codenamed Kiro, deleted a live AWS environment. The agent executed a cleanup task autonomously, interpreted production infrastructure as disposable, and removed it. The outage lasted 13 hours.

Three months later, Amazon published a policy change: all AI-generated code requires senior engineer sign-off before deployment. This was the first Fortune 500 company to publicly institute an explicit architectural review gate for AI-generated code.

The March 2026 follow-ups were worse. Two outages in the same week, both traced to AI code merged without architectural review: 6 hours and 120,000 lost orders on March 2, 6.3 million in lost orders on March 5. The engineers involved had applied Karpathy's workflow to a codebase Karpathy had never heard of, at a scale that made the failure catastrophically visible.

The pattern in both incidents was the same: the AI wrote code that was locally correct — it did what the immediate task specified — and globally wrong, because the immediate task didn't capture the system constraint the code violated.

The Category Error

Vibe coding fails in production for the same reason it works for prototypes. The value proposition is speed through trust: trust the model, skip the review, ship fast. That's valid when the cost of being wrong is a deleted side project. It's a different calculation when the cost is a 13-hour outage.

Karpathy's workflow is for one person who understands the full system, working on something he can delete. Production software is the opposite: multiple contributors, partial understanding, no deleting anything without a change window.

The misapplication isn't naivety. It's the classic error of scaling a personal best practice past the context that made it work. What operates cleanly at solo speed doesn't operate cleanly at system scale — not because the tool changes but because the system does.

Related: You Don't Understand Your Own Codebase Anymore covers the comprehension problem from the other direction — senior engineers losing the architectural context that review requires. The two problems compound: you can't review AI-generated code well if you've already lost the architectural understanding the review depends on.

The Version That Actually Works

The companies successfully integrating AI coding tools at scale have a different pattern. They use AI generation heavily at the function and file level. They keep humans in the loop at the architectural boundary — where code touches another service, where a decision will be expensive to reverse, where the local context is insufficient to reason about the full system.

This is slower than vibe coding. It is faster than debugging production at 2am with 6 million orders affected.

The relevant question isn't "can AI write the code?" It can. The question is "does the AI know what system it's writing for?" It doesn't. That gap is where the 43% lives — and why Amazon's approval gate won't stay unusual. The production failure rate will push every serious engineering organization toward the same conclusion: AI code generation and architectural review are not alternatives. They're partners.

Vibe coding works. Just not for production. The term Karpathy coined is real — it describes a genuine workflow shift. The mistake is treating it as a deployment strategy rather than a prototyping philosophy. Prototypes you can delete. Outages you can't.

Photo: Daniil Komov / Pexels