WCAG Wasn't Built for Software That Acts Without You

Cover Image for WCAG Wasn't Built for Software That Acts Without You

A screen-reader user in Portland asks her assistant to book a Tuesday flight to Denver, economy, aisle seat, under $300. Four days later, at the gate, an agent tells her the ticket is for Wednesday, a middle seat, on a connection she never approved. Nobody narrated any of that to her while it happened. The agent booked the flight the way it books everything: fast, multi-step, and silent. She had no way to catch the error mid-flow, because there was no flow she could see.

That's not a bug report. That's the accessibility model breaking at the foundation.

WCAG was written for a specific kind of user: someone who perceives an interface, decides what to do next, and acts on that decision themselves. Every success criterion in the guidelines assumes this closed loop between a person and a screen. Autonomous AI agents don't live inside that loop. They plan, decide, and execute across many steps without a human confirming each one. So the accessibility question stops being "can this person perceive and operate the interface" and becomes something WCAG has no vocabulary for: can the thing acting on someone's behalf be trusted not to wreck their day while they aren't watching.

What WCAG Actually Assumes About Who's Acting

Read the four WCAG principles closely and a hidden assumption sits under all of them. Perceivable means a human can take in the information. Operable means a human can work the controls. Understandable means a human can follow what's happening. Robust means assistive technology can faithfully hand information to that human. Every one of those verbs points at a person who both perceives and decides.

An aria-live region is a good example. It announces a page update to a screen-reader user so they can decide their next move. That works when the user is the one making the next move. It's close to meaningless when an agent has already made and executed five moves before the live region finishes announcing the first one. Focus order matters because a keyboard user tabs through it deliberately, choosing where to land. An agent doesn't tab. It targets an element directly through the accessibility tree, and the visual and semantic structure a sighted developer built for a human wayfinder is irrelevant to how the agent gets there.

None of this makes WCAG wrong. For twenty-five years it correctly modeled the only actor in the interface: the person in front of it. It just never anticipated a second actor showing up who reads the same accessibility tree, obeys none of the same pacing, and reports back only after the fact.

Compliance testing inherits the same blind spot. An automated scanner checks whether a button has an accessible name, whether contrast ratios pass, whether the DOM order matches the visual order. It never asks whether the entity reading that accessible name is the same entity the guideline was written to protect. A page can pass every automated and manual WCAG audit a team runs and still be a page an agent misreads on a disabled user's behalf, because passing the audit was never the same claim as being safe to hand over control of.

What Breaks When the Agent Fills the Form

Consider a motor-impaired user who hands a forty-field government benefits form to an agent because tabbing through it manually costs her real physical pain. The form has a radio button pair labeled ambiguously enough that a sighted mouse user would resolve the ambiguity instantly from layout alone, a WCAG 4.1.2 gap most audits would flag as minor. The agent, working from the accessible name rather than the visual context, picks the wrong option and submits her into the wrong benefits category. She finds out three weeks later, in a denial letter.

Or take a cognitively disabled user who has an agent complete a weekly grocery order. Midway through checkout, the agent encounters a subscription toggle whose accessible name doesn't match what's rendered on screen, a dark pattern built to survive automated accessibility testing precisely because it technically passes. The agent reads the accessible name, not the deceptive visual, and enrolls her in a recurring charge she never wanted and won't notice until her card statement arrives.

Both failures happen at exactly the point WCAG was supposed to protect. And in both cases, the person the guideline exists for finds out last, because the entire safeguard was designed around a user watching in real time, not one who delegated the watching to something else. Neither of these is a hypothetical edge case dreamed up for a blog post. They're the ordinary shape of what happens when a system optimized for task completion meets a form built by someone who never imagined the reader would be software.

The Position: Assistive Agents Need Accessibility Alignment paper names this precisely: an agent can satisfy every accessibility criterion an auditor checks and still strip the user of the thing accessibility was supposed to guarantee them, which is control over what happens to them. The paper's authors put it as a structural claim, not a complaint about sloppy implementation. Assistive technology was built to translate an interface faithfully to a human who then decides. An assistive agent translates and decides. Those are different jobs, and WCAG only ever specified the first one.

What Accessibility Alignment for Agents Actually Requires

"Alignment" here isn't a metaphor borrowed loosely from AI safety talk. The arXiv paper argues assistive agents need alignment in the literal sense: optimized not just to complete the task, but to preserve the disabled user's ability to steer, interrupt, and understand what's happening to them, even when they can't monitor every step. That means checkpoints before irreversible actions, calibrated by consequence rather than by task type. It means narrating intent before execution instead of only logging what already happened. It means an interrupt that costs the user almost nothing to trigger, because someone with limited motor control or limited working memory won't use an override that takes twelve seconds and three menus to reach.

Google Research's piece on how AI agents can redefine universal design to increase accessibility makes the optimistic case: agents built with this in mind from the start remove barriers that decades of retrofitted compliance never touched, navigating hostile layouts, translating dense forms, doing the labor that WCAG conformance never actually eliminated. A page with terrible heading structure and no landmark regions is a nightmare for a screen reader and often a non-event for an agent parsing the DOM directly. That's a genuine gain, and it's worth taking seriously instead of dismissing as marketing.

But the same capability that lets an agent route around bad markup is the capability that lets it route around the user too. Universal design, in the old sense, meant building one interface that worked for everyone without a translation layer in between. Agents reintroduce the translation layer by design, and the promise only holds if that layer stays accountable to the person it's acting for. That promise is real. It's also conditional on treating alignment as the design brief, not as a patch applied after the agent already works.

We've written before about the broader open question of how anyone should design interfaces for AI agents in general — trust calibration, transparency, override points, the whole unsettled vocabulary of agentic UX. That question is real and still wide open. This one sits underneath it. It's not about how an agent's interface should look. It's about what happens to a person who can't independently verify that look in the first place.

So Actually, the Most Vulnerable Users Are the Intended Ones

The instinct is to treat accessibility-under-agents as a niche concern, something to solve after the general agentic UX patterns settle. That gets the priority backwards. The people who benefit most from an agent doing the clicking are, by definition, people who face friction doing some part of the clicking themselves. Screen-reader users, motor-impaired users, cognitively disabled users are the exact population agents are marketed to help first. They are also the exact population with the least capacity to catch an agent's mistake before it becomes irreversible, precisely because catching mistakes in real time was the accommodation they needed the agent for in the first place.

So the population WCAG was written to protect is the population an unaligned agent endangers most, and it's the same population most likely to adopt agents fastest, because the alternative is doing everything manually at real cost. That's not an edge case sitting at the margins of agentic design. It's the central case. Accessibility-under-autonomy belongs next to hallucination rate and prompt-injection robustness in how an agent gets evaluated before shipping, not in an appendix to a WCAG checklist that assumes a human is always the one clicking.

The next accessibility audit that matters won't ask whether a screen reader can read the button. It'll ask whether the thing that clicked the button asked first.

Cover photo by Moe Magners via Pexels.