AI Engineering, Infrastructure17 min read

Vector Databases 2026: pgvector vs Pinecone vs Qdrant

Compare pgvector, Pinecone, Qdrant, Weaviate, and Milvus on indexing, filtering, scale, and cost to pick the right vector database for RAG.

Vector Database Comparison 2026: pgvector vs. Pinecone vs. Qdrant vs. Weaviate vs. Milvus

TL;DR: pgvector, Pinecone, Qdrant, Weaviate, and Milvus are the five vector databases most teams evaluate for retrieval-augmented generation in 2026. pgvector is the pragmatic default when your data already lives in PostgreSQL. Pinecone is the zero-ops managed service for teams that never want to touch infrastructure. Qdrant is the Rust-built performance choice for filter-heavy workloads. Weaviate bundles embedding generation and native hybrid search. Milvus is the distributed engine built to serve billions of vectors. Choose the one whose operational model, not its benchmark chart, matches your team.

Key Takeaways

  • The first real decision is not "which vector database" but "do I need a dedicated one at all" — if your data is already in PostgreSQL, pgvector avoids an entirely separate system to operate, secure, and keep in sync.
  • Nearly every option converges on the same core index, HNSW (Hierarchical Navigable Small World), so raw approximate-nearest-neighbor speed rarely decides the winner; filtering, scaling, and operations do.
  • HNSW holds its graph in RAM, so memory is the hidden cost: one million 1536-dimension `float32` vectors need roughly 6 GB before index overhead — quantization can cut that by 4x to 32x.
  • Metadata filtering strategy (pre-filter vs. post-filter) matters more than most benchmarks show, because a naive post-filter can silently return fewer results than you asked for.
  • Managed services (Pinecone, Zilliz Cloud, Weaviate Cloud, Qdrant Cloud) trade dollars for eliminated operational burden; self-hosting the open-source engines trades engineering time for control and data residency.
  • Milvus and Qdrant scale horizontally to billions of vectors, Weaviate offers strong multi-tenancy, Pinecone auto-scales as a serverless black box, and pgvector scales with the Postgres node it runs on.

What Is a Vector Database, and Why Not Just Add an Index?

A vector database stores high-dimensional embeddings — the numeric fingerprints that models produce for text, images, audio, or code — and answers one core question fast: which stored vectors are closest to this query vector? That "closest" is measured by cosine similarity, dot product, or Euclidean (L2) distance, and it is the retrieval half of every retrieval-augmented generation (RAG) system, semantic search feature, and long-term agent memory. If you are new to why retrieval quality dominates agent behavior, our guide on context engineering for AI agents explains how retrieved chunks become the model's working memory.

The naive approach — compute the distance from your query to every stored vector and sort — is called a flat or brute-force search. It returns exact results, and for a few thousand vectors it is perfectly fine. It falls apart at scale: comparing one query against ten million 1536-dimension vectors is roughly fifteen billion floating-point multiplications per query. Approximate nearest neighbor (ANN) indexes exist to trade a sliver of recall for orders-of-magnitude speed, and building, tuning, and serving those indexes at scale is precisely what a vector database does that a plain ORDER BY distance cannot.

The confusion in 2026 is that "vector database" now spans two very different things: a dedicated engine built from scratch for vector search (Pinecone, Qdrant, Weaviate, Milvus) and a vector extension bolted onto an existing database (pgvector for PostgreSQL). Both answer the same query. They differ enormously in what you have to operate.

How Do the Five Vector Databases Compare at a Glance?

Treat this table as the shape of the decision, not the final answer. The rest of this guide explains why each row matters in production.

What Index Algorithms Power Each Database?

Almost every vector database in this comparison offers HNSW as its primary index, which is why head-to-head ANN benchmarks tend to cluster closely. HNSW builds a multi-layer graph where each node links to its nearest neighbors; a search walks the graph greedily from the top layer down, typically hitting greater than 95% recall at high query-per-second rates. Its two build parameters — m (connections per node) and ef_construction (candidate list size during build) — plus the query-time ef_search, control the recall/speed/memory trade-off on every engine.

HNSW's defining weakness is memory. The graph lives in RAM for fast traversal, so cost scales with your corpus:

One million 1536-dimension float32 vectors is 1,000,000 × 1536 × 4 bytes ≈ 6.1 GB of raw vectors, before HNSW's graph edges add roughly another 20–40%.

This is where the index-type differences become real money. Three techniques attack that cost:

  • IVF (inverted file) partitions vectors into clusters and searches only the nearest few, cutting compute at some recall cost. Milvus and pgvector (IVFFlat) offer it; it is memory-light but needs periodic retraining as data drifts.
  • DiskANN keeps the bulk of the graph on SSD instead of RAM, trading a little latency for a massive drop in memory cost — the key to billion-scale on affordable hardware. Milvus supports it natively; pgvector gets it through the pgvectorscale extension's StreamingDiskANN.
  • Quantization compresses each dimension. Scalar quantization (`float32` → `int8`) cuts memory ~4x; binary quantization (→ 1 bit) cuts it ~32x, taking that 6.1 GB corpus down to roughly 190 MB at the cost of recall you recover with a re-ranking pass. Qdrant, Weaviate, and Milvus all ship scalar, product, and binary quantization.

The practical takeaway: pick the engine by which memory-reduction path it supports at your scale, not by a leaderboard QPS number measured on a corpus that fits entirely in RAM.

pgvector: When Your Vector Database Should Just Be Postgres

pgvector is an open-source extension that adds a vector type and ANN indexing to PostgreSQL. Its entire value proposition is not being a separate system. Your embeddings sit in the same database as your relational data, inside the same transactions, backed up by the same tooling, secured by the same roles, and queried by the same SQL.

That WHERE topic = 'kubernetes' clause is pgvector's superpower: filtering is just SQL, planned and executed by an engine that has done it for thirty years, and you can JOIN retrieved rows against any other table. The operators are terse — <-> for L2, <#> for (negative) inner product, <=> for cosine, <+> for L1.

The limits are equally clear. Indexed vectors are capped at 2,000 dimensions (4,000 with halfvec), you scale with a single Postgres node plus read replicas rather than a distributed cluster, and squeezing top-tier ANN performance at tens of millions of vectors usually means adding pgvectorscale for DiskANN-style indexing. For most applications under ~10 million vectors that already use Postgres, pgvector is the highest-ROI choice precisely because it adds zero new operational surface.

Pinecone: The Zero-Ops Managed Standard

Pinecone is the proprietary, fully managed vector database that popularized the category. There is no open-source Pinecone to self-host — you use its cloud, and in exchange you never think about indexes, memory, sharding, or replication. Its serverless architecture separates storage from compute so you pay for what you query rather than for provisioned pods, and it scales up and down without your involvement.

Namespaces give you cheap per-tenant isolation, metadata filtering is first-class, and Pinecone Inference now hosts embedding and reranking models so you can build a RAG pipeline without a separate embedding provider. The trade-offs are the flip side of "managed": your vectors live in someone else's cloud (a non-starter for some compliance regimes), costs are usage-based and can surprise you at high volume, and you accept a proprietary black box you cannot inspect or run on-premise. Pinecone is the right call when engineering time is your scarcest resource and data residency is not a hard constraint.

Qdrant: Rust-Powered Filtering and Quantization

Qdrant is an open-source (Apache 2.0) engine written in Rust, which shows up as predictable low-latency performance and tight memory control. Its signature feature is filterable HNSW: rather than filtering before or after the graph search (each of which has failure modes covered below), Qdrant integrates payload filtering into the graph traversal, so heavily filtered queries stay both accurate and fast.

Qdrant ships scalar, product, and binary quantization, giving you the memory-reduction dial directly. It runs as a single binary for local development, offers a managed Qdrant Cloud, and its FastEmbed library covers embedding generation. It is the strongest pick when your queries lean hard on structured filters — per-user, per-tenant, per-category — and you want open-source control with production-grade performance.

Weaviate is an open-source (BSD-3) engine written in Go whose differentiator is batteries included. Through vectorizer modules it can call an embedding model for you at insert and query time, so you send raw text and Weaviate produces and stores the vectors — no separate embedding step in your pipeline. It also ships native hybrid search, fusing BM25 keyword scoring with vector similarity in a single query rather than making you stitch two systems together.

That alpha knob — sliding from pure keyword to pure vector — is exactly the control RAG systems need when semantic search alone misses exact identifiers, error codes, or product names. Weaviate supports HNSW with quantization, strong multi-tenancy for SaaS builders, and Weaviate Cloud for a managed path. Choose it when you want retrieval and embedding as one integrated system, especially if hybrid search is a hard requirement.

Milvus: Distributed Scale to Billions of Vectors

Milvus is an open-source (Apache 2.0) database engineered for the largest corpora. Its distributed architecture separates concerns into independent, individually scalable roles — query nodes, data nodes, index nodes — coordinated through etcd, a message queue (Pulsar or Kafka), and object storage (MinIO or S3). This is heavy to operate but is what lets Milvus serve billions of vectors with high throughput.

Milvus offers the widest index menu of any option here: FLAT for exact search, the IVF family, HNSW, DiskANN for disk-backed scale, and GPU indexes (CAGRA, via NVIDIA's RAFT) for workloads where GPU acceleration justifies the hardware. For local development you are not forced into the full cluster — pip install milvus gives you Milvus Lite, an embedded version, and Zilliz Cloud offers the fully managed product.

The calculus with Milvus is straightforward: below tens of millions of vectors its distributed machinery is overkill and a simpler engine will serve you better; at hundreds of millions to billions, that same machinery is exactly what you need and few alternatives compete. Pick Milvus when scale is the dominant requirement — or pick Zilliz Cloud to get that scale without running the cluster yourself.

How Should You Handle Metadata Filtering?

Filtering is where vector search quietly goes wrong, and it deserves more attention than benchmark charts give it. Say you want "the 5 most similar documents where topic = 'kubernetes'." There are three ways an engine can do this:

  • Post-filtering: find the top-K nearest vectors, then discard those that fail the filter. The danger: if none of the top-K match `topic = 'kubernetes'`, you get zero results even though matching documents exist deeper in the index. You asked for 5 and silently got 1, or 0.
  • Pre-filtering: restrict to matching rows first, then search vectors only within that set. Correct, but naive implementations lose the ANN index and fall back to a slow brute-force scan over the filtered subset.
  • Filtered/integrated search: apply the filter during graph traversal so the index stays in play and results stay complete. This is Qdrant's filterable HNSW, Weaviate's filtered search, and Milvus's boolean-expression filtering.

pgvector sidesteps the whole debate by handing filtering to the PostgreSQL query planner, which decides between an index scan and a filtered scan using real table statistics — one of the underrated advantages of keeping vectors in a mature relational engine. When you evaluate any vector database, test it with your real filters applied, not just on unfiltered top-K recall, because filtered performance and filtered correctness are what production actually exercises.

What About Hybrid Search and Cost?

Hybrid search — combining dense vector similarity with sparse keyword matching (BM25) — has become a near-requirement because pure semantic search misses exact tokens like SKUs, function names, and error codes. Weaviate offers it natively with a fusion alpha; Qdrant, Milvus, and Pinecone support it through sparse-plus-dense vectors; pgvector expects you to combine vector distance with PostgreSQL's tsvector full-text search yourself. If hybrid is central to your product, weight Weaviate's turnkey implementation accordingly.

Cost splits cleanly along the hosting axis, and prices change often — always confirm current numbers on each vendor's page:

The pattern mirrors the broader observability and framework markets we covered in LangSmith vs. Langfuse vs. Phoenix: the open-source engines are "free" only if you value your operational time at zero, while managed services convert that time into a predictable bill.

Which Vector Database Should You Choose?

There is no universal winner — map the engine to your binding constraint:

  • Choose pgvector if your data already lives in PostgreSQL, you are under roughly 10 million vectors, and you would rather extend a database you already operate than adopt and secure a new one. It is the correct default far more often than the hype suggests.
  • Choose Pinecone if you want zero operational burden, are comfortable with a proprietary cloud, and value shipping speed over infrastructure control or on-premise data residency.
  • Choose Qdrant if your queries are filter-heavy, you want open-source control with Rust-grade performance, and fine-grained quantization control matters.
  • Choose Weaviate if you want embedding generation and native hybrid search built into the database, or you are building multi-tenant SaaS retrieval.
  • Choose Milvus (or Zilliz Cloud) if you are heading toward hundreds of millions or billions of vectors and need a genuinely distributed engine with the broadest index and GPU support.

The meta-point: the databases converge on the same ANN algorithms, so your decision rarely hinges on raw search speed. It hinges on operational model, filtering behavior, scaling ceiling, and cost — the dimensions a leaderboard cannot show you. Prototype with the one that fits your team's constraints, benchmark it on your data with your filters, and remember that the embedding model you feed it usually affects retrieval quality more than the database you pick.

Frequently Asked Questions

Is pgvector good enough for production RAG, or do I need a dedicated vector database?

For most applications under roughly 10 million vectors, pgvector is production-ready and often the better engineering choice because it adds no new system to operate. It supports HNSW indexing, cosine/L2/inner-product distance, and full SQL filtering and joins inside PostgreSQL's transactions and backups. You typically outgrow it when you need to scale beyond a single node's memory, serve hundreds of millions of vectors, or require distributed high availability — at which point a dedicated engine like Milvus or Qdrant, or the pgvectorscale extension for DiskANN indexing, becomes worthwhile.

What is the difference between HNSW and IVF indexes?

HNSW (Hierarchical Navigable Small World) builds a multi-layer navigable graph and delivers high recall at high query throughput, but it holds the graph in RAM, so memory scales with your corpus. IVF (inverted file) partitions vectors into clusters and searches only the nearest few, using far less memory but generally offering lower recall at the same speed and requiring periodic retraining as data changes. HNSW is the default for latency-sensitive workloads that fit in RAM; IVF and disk-based indexes like DiskANN are preferred when memory cost dominates at very large scale.

Which vector database is cheapest?

Self-hosting the open-source engines — pgvector, Qdrant, Weaviate, or Milvus — is the cheapest in licensing terms, since you pay only for the infrastructure they run on, but you absorb the engineering cost of operating them. pgvector is often the lowest total cost when you already run PostgreSQL, because it adds no new servers. Managed services like Pinecone, Zilliz Cloud, and Weaviate Cloud cost more in dollars but eliminate operational overhead; whether that trade is "cheaper" depends on how you value your team's time and your query volume.

Do I need hybrid search for retrieval-augmented generation?

You need hybrid search when your queries contain exact tokens that pure semantic search tends to miss — product SKUs, error codes, function names, or specific identifiers — because dense vector similarity captures meaning but can overlook literal string matches. Hybrid search fuses vector similarity with keyword (BM25) scoring to get both. Weaviate provides it natively with a tunable fusion weight; Qdrant, Milvus, and Pinecone support it via sparse-plus-dense vectors; with pgvector you combine vector distance with PostgreSQL full-text search manually. If your corpus is purely conceptual prose, dense-only retrieval may be sufficient.

Subscribe to the newsletter

By subscribing, you agree to our Terms of Service and Privacy Policy.

About the Author

Aaron is an engineering leader, software architect, and founder with 18 years building distributed systems and cloud infrastructure. Now focused on LLM-powered platforms, agent orchestration, and production AI. He shares hands-on technical guides and framework comparisons at fp8.co.

Cite this Article

Aaron. "Vector Databases 2026: pgvector vs Pinecone vs Qdrant." fp8.co, July 1, 2026. https://fp8.co/articles/Vector-Database-Comparison-pgvector-Pinecone-Qdrant-Weaviate-Milvus

Related Articles

Context Engineering for AI Agents: 6 Ways to Cut Costs 10x

One misplaced timestamp 10x'd our LLM bill by busting the KV cache. Learn 6 context engineering patterns from production agent teams that prevent it.

AI Engineering, Agent Frameworks

Agent Memory Framework 2026: LangChain vs AgentCore vs Strands

Add long-term memory to your LangChain AI agent. 3 frameworks compared: LangChain (flexible), AgentCore (managed), Strands (minimal). See architecture, persistence, and scaling limits.

Agent Memory Management

LangSmith vs Langfuse vs Phoenix: LLM Observability

Your agent failed in prod and you can't reproduce it. Compare LangSmith, Langfuse, and Phoenix on tracing, evals, self-hosting, and cost.

AI Engineering, Observability