pgvector Is Already in Your Database. Most Teams Are Using It Wrong.

The pull request was eight lines. Add the pgvector extension, create a column of type vector(1536), build an index. The demo worked. Everyone moved on.
Twelve months later the query latency had crept from 50ms to 800ms. The search results were technically returning the most similar vectors but users kept reporting irrelevant hits. The team had been storing 400,000 documents. Nobody had benchmarked anything beyond the first 1,000.
This is not an unusual story. pgvector is installed on more Postgres instances than almost any other extension added in the last three years — and it's misconfigured in most of them. The mistakes are consistent, they're predictable, and they compound.
The Three Failure Points in Production pgvector
Dimensionality choices that nobody measures
The first decision teams make when adding vector search is which embedding model to use. The default answer is usually whichever model the documentation example used. For a long time, that was OpenAI's text-embedding-ada-002, which outputs 1,536-dimensional vectors.
1,536 dimensions is not better than 768 dimensions for most retrieval tasks. It's heavier.
Hugging Face's MTEB (Massive Text Embedding Benchmark) benchmarks show that models producing 768-dimensional embeddings — like the all-MiniLM-L6-v2 family or Cohere's embed-english-v3.0 — routinely match or outperform 1,536-dim models on retrieval tasks while requiring roughly half the index memory. The pgvector HNSW index memory scales non-linearly with dimensionality: doubling dimensions doesn't double your index size, it can triple it depending on your m and ef_construction settings.
Teams almost never benchmark dimensionality. They use what they started with. When the index hits 500MB and query plans start spilling to disk, nobody connects it to the embedding choice made 18 months earlier.
The fix is to benchmark before you scale. Pick three embedding models including at least one under 768 dimensions, run them against a representative sample of your actual query corpus, and measure recall at k=10 against human-labeled relevant results. The difference in accuracy is rarely worth the difference in weight.
IVFFlat where HNSW belongs
pgvector ships two index types: IVFFlat and HNSW. The documentation explains both. The tutorials mostly use IVFFlat because it was added first and it's simpler to reason about.
IVFFlat works by partitioning your vectors into lists (similar to k-means clustering), then at query time searching the nearest lists. The lists parameter controls how many partitions you create; the probes parameter at query time controls how many lists you search. Accuracy goes up with more probes, latency goes up with more probes.
The problem: IVFFlat accuracy degrades as your dataset grows. At 100,000 vectors it's fine. At 1 million it starts missing relevant results with default settings. The tradeoff between speed and accuracy becomes hard to tune because the two parameters interact nonlinearly with dataset size.
HNSW (Hierarchical Navigable Small World) builds a multi-layer graph where each layer is a subset of the vectors, increasingly sparse at higher layers. Query traversal starts at the top layer, narrows down through each layer to the closest candidates. Build time is slower. Memory footprint is larger. Query performance at scale is dramatically better, and accuracy holds at high dataset sizes without constant retuning.
pgvector's HNSW implementation (added in version 0.5.0) matches or beats specialized vector database query times on most benchmarks. If you're running more than 100,000 vectors and you're still on IVFFlat, you should profile before the dataset doubles again.
The missing reranker
This is the one nobody's tutorial mentions.
Raw similarity search returns candidates based on their cosine or L2 distance in embedding space. Embedding space is a proxy for semantic similarity — not a measure of relevance to your actual query. Close in embedding space means semantically related. It doesn't mean what you wanted.
At 10,000 documents, this works well enough. At 100,000 it generates false positives. A document that uses the same vocabulary in a different context, or a document that answers a related but incorrect question, can score higher than the document you actually needed.
A reranker is a second model that takes the top-N candidates from your vector search and scores them against the original query more precisely. Cross-encoder models (BERT-family architectures that process the query and document together rather than independently) are the standard choice. They're slower than embedding similarity — which is why you don't use them on the full corpus — but on 50 candidates they add 50-100ms latency and measurably improve precision.
Cohere's Rerank API, Jina Reranker, and the open-source cross-encoder/ms-marco-MiniLM-L-6-v2 from Hugging Face are the common options. The database indexing tradeoffs I wrote about earlier apply here too — the reranker is the second layer of your retrieval architecture, and it's the one that converts "semantically plausible" into "actually relevant."
What a Production-Ready pgvector Stack Actually Looks Like
The pattern that works at scale:
-
Embedding layer: 768-dim or under, benchmarked against your specific query corpus. Batch embeddings asynchronously; don't block on them in the request path.
-
Index layer: HNSW with
m=16andef_construction=64as a starting baseline. Tuneef_searchat query time based on your latency budget vs recall requirements. Build the index once on initial population; add vectors incrementally with the understanding that HNSW requires periodic rebuilds to maintain quality. -
Retrieval layer: Retrieve top-50 to top-100 candidates from the vector index, not top-5. Give your reranker enough to work with.
-
Reranking layer: Cross-encoder scores the retrieved candidates against the original query text. Return the top-5 or top-10 from the reranked list.
-
Evaluation layer: This is the one almost nobody builds. You need labeled evaluation sets — query-document pairs with human-judged relevance — to actually measure whether your search is working. Without this, you have no signal that things have drifted. Vector search quality can degrade silently as your corpus changes.
The Measurement Problem
The root issue with most pgvector deployments isn't configuration — it's the absence of measurement.
Vector search introduces a new class of quality problem that didn't exist with keyword search. With keyword search, "does this result contain the query term?" is a binary question you can answer automatically. With semantic search, "is this result relevant to what the user wanted?" requires human judgment or an evaluation model.
Teams ship vector search, observe that the demos look good, and ship to production without establishing baseline recall or precision metrics. When search quality degrades — because the corpus changed, because a new class of queries doesn't match the embedding model's strengths, because HNSW quality has drifted from a high write rate — there's no signal. Users just quietly stop trusting search.
Build the evaluation layer first. Even a dataset of 500 manually-labeled query-result pairs is enough to catch regressions. Deploy search only once you have a way to measure it.
The Architecture Framing
pgvector is a retrieval component, not a search system. It finds candidates. The system decides what to do with them.
Teams that treat pgvector as the search layer — configure it once, ship, done — are conflating retrieval with relevance. Those are different problems that require different tools. The embedding model handles semantic representation. The index handles approximate nearest-neighbor lookups. The reranker handles relevance scoring. The evaluation framework handles quality measurement.
None of this is complicated. None of it requires switching to a specialized vector database. PostgreSQL can run a production-grade semantic search system with pgvector, a reranker model, and a modest evaluation dataset. The teams getting poor results are skipping two or three of those components and wondering why the output isn't what they expected.
The question isn't whether pgvector can handle your search. It's whether you've built a search system, or just added an extension.
Photo by panumas nikhomkhai via Pexels