Your AI Agent Has No Plan B. That's Not a Feature — It's a Time Bomb.

June 4, 2026

The agent ran. It returned something. Nobody caught that it was wrong.

That's not a model quality problem. It's an architecture problem — and it's happening in production right now at companies that shipped agents without designing for failure.

Most AI agent work focuses on the happy path: the agent interprets the task, retrieves the right context, calls the right tools, and returns good output. That's what gets demoed. That's what gets shipped. What doesn't get designed is what happens when any of those steps go sideways — and they do, routinely, in production in ways that testing never surfaces.

The Three Failure Modes Nobody Ships For

There's a taxonomy worth understanding. AI agents fail in three distinct ways that require three different design responses, and most teams handle none of them.

Transient failures are the ones people think about: rate limits, timeouts, API unavailability. A tool call returns a 503. The LLM returns an empty response. These feel like classic engineering problems and they are — but agent-level retry logic is different from API retry logic. If an agent calls a research tool, gets no response, and silently skips to the next step, the downstream output isn't "failed." It's wrong. It looks like success. Retry logic for agents has to understand the semantic impact of each step, not just whether the HTTP call succeeded.

Accumulated drift is trickier. This is when the agent's internal state — its working context, its understanding of where it is in a multi-step task — diverges from reality across a long-running run. The agent thinks it's on step 4. It's actually repeating step 2 for the third time with slightly different outputs each round. Nothing returns an error. The agent completes. The output is subtly broken. There's no stack trace to read. Without a mechanism to checkpoint state and validate it against expected progression, drift is invisible until someone actually uses the output.

Dead-end entrapment happens when the agent reaches a point where it can't proceed — the tool it needs requires credentials it doesn't have, the document it's supposed to process is malformed, the user's request contains a contradiction that makes completion impossible — and instead of stopping or escalating, it improvises. It makes a decision that wasn't sanctioned. It fills in a gap with a hallucination. It completes the task in a way that technically satisfies the prompt but doesn't do what was needed. This is the most dangerous failure mode because the agent appears to have succeeded.

What "Handling" These Actually Means

Retry logic for agents isn't try: ... except: retry(). That's API client code.

Agent retry logic means: when a step fails, do I retry the step, restart from the last checkpoint, or reclassify the whole task as a different kind of problem? It requires knowing which steps are idempotent (safe to retry), which steps have side effects that make retry dangerous, and which step failures should propagate all the way up to task cancellation.

Graceful degradation means: if the agent can't complete the full task, what partial output is acceptable and what partial output is harmful? A research agent that returns findings from 4 out of 6 sources might be useful. A code-writing agent that writes 80% of the function and silently omits the validation logic might be actively dangerous. These aren't the same thing, and treating them the same way — "return what we have" — is not a degradation strategy.

Human escalation means: there are classes of situations your agent should not navigate autonomously. A task that involves irreversible actions. A situation where the agent's confidence is below a meaningful threshold. A step where the agent lacks authorization. The failure mode here isn't the agent breaking — it's the agent proceeding when it should stop and ask. Most agents don't have a mechanism for this because nobody designed the "ask a human" path. It doesn't fit neatly into the happy-path architecture.

The Monitoring Gap

The absence of failure-handling design wouldn't matter as much if teams had monitoring that surfaced silent failures. Most don't.

Token usage metrics, latency, and error rates tell you whether your agent is running. They don't tell you whether it's producing correct output. An agent that completes in 8 seconds with 4,000 tokens and no API errors could have just sent an email to the wrong person, generated a report that cited sources it invented, or approved a transaction that should have been flagged for review.

Behavioral monitoring — the kind that checks agent outputs against expected patterns, validates that the steps taken were the steps that were supposed to be taken, and flags when agent behavior diverges from baseline — is largely absent from production AI agent deployments. Teams evaluate their agents on benchmarks and test suites, then ship them into environments where the actual failure modes differ categorically from the ones the test suite covers.

The gap between "passed eval" and "performs reliably in production" is real, and it's not primarily a model quality gap. It's a failure-handling gap.

Why This Gets Skipped

It gets skipped for the same reason all failure-handling work gets skipped: it's invisible work.

Building the happy path is legible. There's a demo at the end. Building the failure path requires writing code that most stakeholders will never see execute — code whose value is entirely in preventing bad things from happening rather than producing good things. The incentive structure of most engineering organizations rewards shipping features, not shipping defenses.

There's also a category error at play. Teams that come from microservices backgrounds think of agents as services, and they apply service-level reliability patterns — health checks, circuit breakers, dead-letter queues. These patterns help, but they're not sufficient, because agents are stateful reasoning processes, not stateless request handlers. A circuit breaker that opens after three timeouts doesn't help you when your agent has spent 40 steps building toward a conclusion that's now unreachable without the one tool that's down.

The AI governance and accountability gaps that emerged through 2026 were often traced back not to model failures but to architectural gaps in how agents were deployed — no audit logs, no clear ownership of agent outputs, no defined behavior for edge cases. Retry and fallback logic is part of the same problem. It's the infrastructure that makes agents trustworthy rather than merely capable.

What the Discipline Looks Like

Teams that ship reliable agents have a few things in common.

They define failure modes before writing agent logic. They ask: what does failure look like for this specific agent? What's a transient problem vs a terminal one? What partial output is useful vs harmful? They write the failure paths first because those force the architectural decisions that the happy-path implementation often obscures.

They checkpoint state explicitly. Long-running agents write their progress to durable storage at meaningful intervals — not after every LLM call, but after semantically complete steps. Checkpoint granularity depends on the cost and consequences of replaying from a given point.

They design escalation into the agent's decision tree. There are classes of situations where the agent asks rather than proceeds. Those classes are enumerated in advance, not discovered in production.

And they monitor for behavioral drift — not just availability. They define what "normal" agent behavior looks like across a sample of production tasks and alert when outputs diverge from that baseline in structurally meaningful ways.

None of this is exotic. It's what you'd build for any critical automated system. The only reason it doesn't exist in most AI agent deployments is that people shipped before they thought it through.

The pattern shows up consistently in agent failure taxonomies: the most common production failures aren't model-level errors. They're infrastructure gaps — missing retry, missing state management, missing escalation. The model worked fine. The system around the model had no plan for when it didn't.

The agents that perform well in production aren't the smartest ones. They're the ones that know when they're wrong — and have somewhere to go when they are.

Photo by Benjamin Farren