SQLite in Production Isn't Simpler. It's Complicated in a Different Place.

Cover Image for SQLite in Production Isn't Simpler. It's Complicated in a Different Place.

Turso published a benchmark comparing its embedded-replica architecture against a plain remote connection. It ran 3,000 sequential inserts. The embedded replica — the "fast" local-first option — took 152 seconds. The remote connection took 17. Read-your-writes consistency checks: 48.6 seconds against 156 milliseconds. Nine times slower on inserts. Three hundred times slower on the exact workload embedded replicas exist to speed up.

Turso published this. Not a competitor. Not a critic three years late to the trend. The company selling you the architecture ran the numbers and put them in their own docs, because the failure mode only shows up under a specific write pattern, and they'd rather you learn it from a blog post than a postmortem.

That's the tell. SQLite-in-production isn't a simplicity story with an asterisk. It's a genuine trade-off wearing a simplicity costume, and the costume is doing a lot of the selling.

The Pitch Was Never Actually "No Infrastructure"

The case for SQLite at the edge — Turso, LiteFS, Litestream, the whole 2025-2026 wave — goes like this: one file, no server to patch, no connection pool to tune, replicate it to the edge and reads happen at memory speed next to the user. Compared to running Postgres with a primary, two replicas, a connection pooler, and an on-call rotation for failover, it sounds like someone finally removed a tax nobody wanted to pay.

What it actually removes is database administration. What it adds is distributed systems engineering, pushed up into your application code where your ORM used to be.

Fly.io's writeup on SQLite's write-ahead log internals documents the version of this that bites teams first: WAL mode doesn't checkpoint itself under sustained write load. "SQLite will not force a checkpoint on its own, and the WAL file can continue to grow" is the exact finding — meaning a service under real production traffic needs someone to explicitly run PRAGMA wal_checkpoint(TRUNCATE), on a schedule, forever, or watch disk usage climb until a query blocks. That's not "no infrastructure." That's infrastructure with the interface hidden until it fails at 2 a.m.

The Single-Writer Ceiling Nobody Puts in the Pitch Deck

Here's the constraint that doesn't show up until it's load-bearing: SQLite allows exactly one writer at a time. LiteFS — Fly's distributed layer built to work around this — still routes every write through a single primary node, and Fly's own documentation on LiteFS is candid that FUSE filesystem overhead caps sustained write throughput at roughly 100 writes per second. Not per shard. Per database.

For a blog, a docs site, a read-heavy SaaS dashboard where writes are rare and reads dominate, 100 writes/second is a ceiling you'll never touch, and the read-latency win at the edge is real and worth having. For event logging, multi-tenant concurrent inserts, anything with real-time analytics — teams building on SQLite-at-the-edge discover the ceiling exists only once concurrent writers start queuing behind it, usually a few months post-launch, usually after the architecture is load-bearing enough that ripping it out costs a quarter.

This is the actual shape of the trade-off: SQLite's simplicity is local. Single file, no schema-migration horror, trivial to reason about on one machine. Its complexity is distributed — conflict handling, WAL management, checkpoint scheduling, and a hard write ceiling that no amount of tuning dissolves, only relocates.

Read the Trade-off Before You Buy the Pitch

None of this is an argument against SQLite in production. Turso's own numbers on the read side are genuinely excellent — that's not marketing, that's physics: a replica sitting in the same region as the request beats a round trip to us-east-1 every time. The argument is against buying "simpler" as a blanket claim, the same way confidence signaling in design systems gets sold as a solved problem when it's actually a set of trade-offs someone has to own.

The honest version of the pitch is: SQLite-at-the-edge trades database-administration complexity for application-architecture complexity, and that trade is a clear win for read-heavy, low-write-concurrency workloads and a clear loss for the opposite. Ask which one you're building before you pick the file.

What to Check Before You Commit

Three questions cut through the pitch faster than any benchmark:

What's your write concurrency, actually measured, not estimated? If it's under 50 writes/second sustained, the single-writer ceiling is irrelevant. If you're not sure, that's the first thing to instrument — not the database.

Who owns the WAL checkpoint schedule? If the answer is "nobody yet," that's a production incident with a delayed timer on it.

Does your team have someone who's debugged distributed replication conflicts before? If not, you're not buying simplicity. You're buying a distributed-systems problem and hiring for it retroactively, under pressure, after the first incident makes it undeniable.

The Hiring Problem Nobody Budgets For

There's a second-order cost that rarely makes it into the architecture decision doc: who's actually equipped to operate this in six months, at 2 a.m., when the WAL file has grown past disk headroom and the on-call engineer has never seen a checkpoint failure before. Postgres has two decades of institutional knowledge behind it — Stack Overflow answers, runbooks, a labor market full of people who've debugged a replication lag before. Distributed SQLite is comparatively new terrain. The docs are good. The community answering 3 a.m. questions is thinner. That's not a permanent state — it'll thicken over the next few years the way every new infrastructure layer eventually does — but it's the state teams are hiring into right now, and "we'll figure it out" is a more expensive plan for a novel failure mode than for a well-trodden one.

None of this is a reason to avoid the architecture. It's a reason to price it honestly against the alternative before the pitch meeting ends, rather than during the postmortem.

The file is simple. The system it's part of never was. The teams that do well with SQLite-in-production are the ones who read the WAL internals doc before the incident, not during it.