Prime Video Cut Costs 90% by Undoing Its Microservices. That Wasn't a Fluke.

Amazon's Prime Video team built a video-quality-analysis service the way every conference talk in 2019 told them to: small, independent, distributed. Step Functions to orchestrate, S3 buckets to shuttle frames between stages, a fleet of Lambda-backed microservices each doing one clean job. Then they hit a wall at scale, and the wall wasn't traffic. It was cost. They tore the whole thing down and rebuilt it as one process running in memory. Infrastructure spend dropped 90%.
That's not a story about Amazon being bad at engineering. It's a story about an entire industry mistaking a scaling technique for a virtue, and only now doing the math on what it actually costs to hold that virtue in place.
Why Prime Video's Monolith Rebuild Actually Worked
The original architecture separated video-frame processing into discrete AWS Step Functions states, with S3 as the handoff layer between them. Every frame comparison meant writing to S3, triggering the next function, reading back from S3, and paying for every state transition along the way. That's not a bug in the design — that's what "microservices" means when you follow the pattern literally. Independent, loosely coupled, individually scalable. It's also spectacularly expensive when the unit of work is small and the handoffs are constant, because you're paying cloud-orchestration tax on every single step instead of just running the next line of code.
The fix, documented directly by the Prime Video engineering team, was to collapse the pipeline into one monolithic process and move data between stages in memory instead of through S3. Same functionality, same output quality checks, radically fewer moving parts — and a 90% reduction in infrastructure cost. No feature was cut to get there. The savings came entirely from removing coordination overhead that the distributed version required just to exist.
This is the detail that gets lost when people retell this story as "microservices bad." Distributed architecture wasn't wrong for every workload Prime Video runs — it was wrong for a workload whose steps happen in tight sequence, at high frequency, on data too small to justify the orchestration layer sitting on top of it. The team didn't discover a universal law. They discovered that they'd applied a pattern designed for one shape of problem to a different shape of problem, and paid the difference in AWS bills.
The Industry Numbers Say This Wasn't a One-Off
If Prime Video were the only data point, this would be an anecdote. It isn't. CNCF's 2025 organizational survey found that roughly 42% of surveyed companies had consolidated microservices back into larger deployable units within the prior two years — not abandoning distributed systems outright, but shrinking the number of independently deployed pieces because the operational cost of keeping them separate stopped paying for itself. Separately, Gartner research on architecture decisions at small-to-medium application scale found around 60% of teams reporting regret over adopting microservices in the first place, citing debugging complexity and infrastructure overhead as the dominant reasons.
Those numbers describe a correction, not a fad reversal. A pattern doesn't get walked back by well over a third of the industry because Twitter got bored of it. It gets walked back because teams ran the pattern in production long enough to measure what it actually cost against what it actually bought them, and for a large share of workloads, the ledger didn't close.
One engineer's account of that ledger, documented publicly by Devrim Ozcay, makes the math concrete: an 8-engineer team ran a product as a monolith in 2023 for roughly $4,200 a month. Chasing best-practice architecture, they split it into 12 services with Kubernetes, Istio, and RabbitMQ. By 2024–2025, the same feature set cost roughly $82,000 a month to run — a nearly 20x increase — with deployment time worse, not better, than the 15-minute monolith deploys they'd started with. Same team size. Same product surface. The only variable that changed was how many independently-running pieces they'd agreed to operate, and every one of those pieces came with its own monitoring, its own failure mode, its own on-call page.
I wrote recently about how the "bundling killed the indie dev tool era" story was really a story about consolidation reducing coordination tax across a different layer of the stack — fewer vendors, fewer integration points, lower operational surface area. This is the same mechanism showing up one level down, inside a single company's own infrastructure instead of across its vendor list.
What Microservices Actually Solve — And What They Don't
None of this means monoliths are correct by default either, and the teams treating this as vindication for "just build a monolith" are making the same category error in reverse. Microservices solve a real problem: independent scaling of components with genuinely different load profiles, independent deployment for teams large enough that a shared release train becomes the bottleneck, and fault isolation when one component's failure shouldn't be able to take down an unrelated one. Netflix's original streaming architecture needed that. A team of 8 engineers running one product with one release cadence almost never does.
The variable that actually predicts whether microservices pay for themselves isn't traffic volume — it's organizational size and shape. If you have four teams that need to ship independently without blocking each other, service boundaries solve a coordination problem you actually have. If you have one team debating whether to add a fourth message queue to a product that eight people maintain, you're importing organizational tooling to solve a problem your organization doesn't have yet, and you'll pay for the tooling regardless of whether the problem shows up.
This is the piece the 2019-era conference-talk version of "microservices" left out entirely: the pattern was popularized by companies operating at a scale most of its adopters were nowhere near, and the parts of the pattern that make sense at that scale — dedicated platform teams, mature service-mesh tooling, distributed tracing built into the culture — don't show up for free just because you copied the topology. You get the coordination cost. You don't automatically get the org that makes the coordination cost worth paying.
So Actually, This Was Never About the Architecture
Here's the reframe: the story isn't "monoliths are back" or "microservices failed." It's that architecture decisions made by copying a pattern instead of measuring a constraint eventually get corrected by the bill. Prime Video's team, the 42% of CNCF's surveyed organizations, and Devrim Ozcay's 8-person shop all ran the same experiment from different directions — adopt the distributed pattern, operate it in production long enough to see its actual cost, and reverse course once the cost became undeniable instead of theoretical.
The teams that will avoid repeating this in five years with whatever the next default pattern turns out to be aren't the ones who've memorized "monolith good, microservices bad" as the new slogan. They're the ones who ask, before splitting anything into a service boundary, what specific organizational or scaling constraint that boundary is supposed to relieve — and who are willing to answer "none, yet" and build the boring thing instead.
What constraint is your service boundary actually solving, and would you be able to name it in one sentence if someone asked at the next architecture review?
Cover photo by panumas nikhomkhai via Pexels.