Nobody Knows Who Owns the AI Code That Just Broke Production

Cover Image for Nobody Knows Who Owns the AI Code That Just Broke Production

The page loads at 2:14am. Your phone buzzes. Production is down. The error trace points to a service that was refactored three weeks ago — mostly by an AI coding assistant, reviewed by a developer who was managing four other pull requests that day. Security says it's a dev problem. Dev says the code passed review. The reviewer says the vulnerability wasn't detectable without running specific test cases that nobody wrote.

Who do you fire?

That's not a hypothetical. It's the question that a growing number of engineering leaders are sitting with right now — and the honest answer is: nobody has built the governance structure to answer it cleanly.

One-in-Five Breaches Now Trace to AI-Generated Code

The 2026 Aikido Security State of AI in Security & Development report surveyed 450 CISOs and senior developers. The headline number: 20% of EU companies and 43% of US companies had experienced a serious security incident they traced directly to AI-generated code.

That's not fringe. That's roughly one-in-five organizations running AI coding tools at scale reporting a breach with a clear AI-generation connection.

More interesting than the incident rate is what the survey revealed about accountability. Respondents were asked to identify who bore primary responsibility when AI-generated code caused an incident. The results:

  • 53% said the security team
  • 45% said the developer who used the AI tool
  • 42% said the code reviewer who approved the PR

This was a multi-select question. Respondents could name multiple parties. The fact that all three percentages are high simultaneously doesn't mean accountability is being "shared" — it means every team is pointing at every other team and nobody has a canonical answer.

In practice, that means nobody owns it. And "nobody owns it" in a post-incident review is how people walk away from serious production failures without anything structurally changing.

Why Traditional Code Accountability Doesn't Map

The old mental model for code accountability was relatively clean: someone wrote the code, someone reviewed it, someone approved the deploy. If it broke, you traced the chain. The PR author was the origin, the reviewer was the gate, and the deploy approver accepted the risk.

AI coding tools break every assumption in that chain.

The author isn't really the author anymore — they accepted a suggestion, modified it partially, maybe regenerated it once or twice. The model that generated the suggestion was trained on a corpus that may include security-vulnerable patterns. The developer reviewing their own AI-generated code faces a cognitive trap: the code looks syntactically correct and follows familiar patterns, so the brain pattern-matches to "safe" and moves on. Reviewers checking someone else's AI-generated code have no context for what was originally suggested versus what was changed.

The deployment approver is trusting a review process that was designed for human-written code — one where the reviewer was expected to understand every line because they wrote nothing. That assumption is gone.

None of the traditional accountability structures were designed for a workflow where the "author" is a model with no liability and the "reviewer" is a human who may have checked 47 PRs that week.

The Governance Layer Nobody Built

Engineering teams adopted AI coding tools fast. The productivity case was immediate and obvious. The governance case was slower, less sexy, and easy to defer.

Most teams that ship AI-generated code at scale still don't have:

An AI code tagging policy. Is there a way to identify which commits were substantially AI-generated? If there's no audit trail, post-incident attribution is guesswork.

Differentiated review standards. AI-generated code may require a different review checklist than human-written code — specifically one focused on security patterns the model is known to generate incorrectly (injection vulnerabilities, credential handling, timing attacks). If reviewers are applying the same process to AI output as they apply to junior engineer output, they're missing the class of errors AI tools produce at higher rates.

Clear escalation paths. When the AI suggested something and the developer shipped it and the reviewer approved it, where does the post-incident conversation start? The current default is everyone points at everyone else until leadership makes an arbitrary call, which destroys team trust and doesn't fix the underlying process.

Ownership of the model's behavior. Which team is responsible for understanding what kinds of errors a given AI coding tool introduces? In most organizations, the answer is nobody — the tool is treated like a smart autocomplete rather than a system that needs its failure modes characterized.

The Organizations Already Figuring This Out

Some teams are ahead of this. The common patterns in organizations that have gotten past the accountability gap:

They assign a named owner for AI tooling — usually a staff engineer or principal whose job includes tracking known failure patterns in the tools the team uses and updating review guidelines accordingly.

They treat AI-generated code review as a specialist task rather than generic review. PR review for AI-heavy refactors is assigned to engineers with security context, not just whoever is next in the rotation.

They maintain a lightweight incident taxonomy that tracks whether a given production issue was connected to AI-generated code. This isn't about blame — it's about building an organizational evidence base for which categories of work need more review scrutiny.

They've defined the consent boundary for AI tools: which parts of the codebase are off-limits to AI assistance (authentication, cryptographic operations, billing logic) and who can approve exceptions.

None of this is complicated. It's just not being done by the majority of organizations that are shipping AI-generated code at volume.

The Actual Risk Is the Governance Debt

The AI code quality research from MSR 2026 found that AI-generated code ships debt at higher rates than code written from scratch — not because the models are bad, but because the review process doesn't catch what the models get wrong. The accountability gap is the same problem from a different angle.

You can have the most capable AI coding tool on the market and still create serious organizational risk if the governance around it is three years behind the tooling. The tools improved faster than the org structures designed to catch what they get wrong.

The 2am wake-up call is coming for more teams. The difference between the teams that recover cleanly and the teams that spend six months in post-incident dysfunction will come down to whether they built the accountability infrastructure before or after the breach.

Retrofitting governance after the incident is possible. It's just significantly more expensive than building it when you adopt the tool.

Photo: Lukas Blazek