Storage Layer

Vector Databases

Pinecone, Qdrant, Weaviate, pgvector, and Chroma — choosing and operating a vector store for production RAG

store → index → query the three operations
filtered ANN search the production requirement
millions to billions the scale range
Contents
  1. What is a vector DB
  2. pgvector: Postgres-native
  3. Qdrant: filterable ANN
  4. Weaviate: schema + search
  5. Pinecone: managed
  6. Chroma: developer-first
  7. Choosing and operating
01 — Definition

What Is a Vector Database?

A vector database stores high-dimensional embedding vectors alongside metadata, and provides approximate nearest neighbor (ANN) search — find the top-k vectors most similar to a query vector.

Compared to traditional DBs: regular DBs filter/sort on exact values. Vector DBs find "semantically similar" items via geometric distance.

Core Operations

Insert: store vector + metadata + optional text. Query: find top-k nearest by cosine/dot/L2. Filter: restrict search by metadata. Delete/Update: remove or modify stored vectors.

⚠️ A vector database is NOT just a vector search library. Production vector DBs add: persistence, horizontal scaling, real-time insert/delete, filtered search, multi-tenancy, backups, and access control.
02 — Postgres-native

pgvector: Postgres-Native

pgvector: open-source Postgres extension. Adds vector column type and HNSW/IVFFlat indices.

Why use it: you already have Postgres, SQL joins across vector and relational data, ACID transactions, existing auth/backup infrastructure.

Supports: cosine, L2, inner product similarity. HNSW index (v0.5+) for fast ANN. IVFFlat for memory efficiency.

Example: pgvector Setup and Query

-- Enable extension CREATE EXTENSION IF NOT EXISTS vector; -- Create table with vector column CREATE TABLE documents ( id BIGSERIAL PRIMARY KEY, content TEXT, embedding vector(1536), -- OpenAI text-embedding-3-small metadata JSONB, created_at TIMESTAMPTZ DEFAULT NOW() ); -- Create HNSW index for fast ANN CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 32, ef_construction = 200); -- Semantic search with metadata filter SELECT content, metadata, 1 - (embedding <=> $1::vector) AS similarity FROM documents WHERE metadata->>'category' = 'finance' -- filter first AND created_at > NOW() - INTERVAL '30 days' ORDER BY embedding <=> $1::vector LIMIT 10;
For most teams already on Postgres, pgvector is the right default up to ~5M vectors. The query planner, connection pooling (PgBouncer), and monitoring you already have apply directly.
03 — Filterable ANN

Qdrant: Filterable ANN

Qdrant: Rust-based vector DB, designed from scratch for filtered ANN search. Open-source + managed cloud.

Key differentiator: "payload filtering" is first-class — doesn't fall back to post-filtering (which kills recall). Maintains separate HNSW graphs per filter combination.

Supports: dense vectors, sparse vectors (BM25), multi-vectors (ColBERT). Named vectors for multi-embedding-per-document.

Example: Qdrant Python Client

from qdrant_client import QdrantClient from qdrant_client.models import Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue client = QdrantClient(url="http://localhost:6333") # Create collection client.create_collection("documents", vectors_config=VectorParams( size=1536, distance=Distance.COSINE )) # Insert with payload client.upsert("documents", points=[ PointStruct(id=1, vector=embedding, payload={"category": "finance", "date": "2024-01-15"}) ]) # Filtered ANN search results = client.search( "documents", query_vector=query_embedding, query_filter=Filter(must=[ FieldCondition(key="category", match=MatchValue(value="finance")) ]), limit=10 )

Qdrant vs pgvector for Filtered Search

ScenariopgvectorQdrant
Low cardinality filter (2 categories)GoodExcellent
High cardinality filter (1000 users)DegradesMaintains recall
Complex nested filtersSQLPayload filter JSON
Pre-filter selectivity <1%ANN recall dropsOptimized
04 — Schema + Search

Weaviate: Schema + Search

Weaviate: open-source vector DB with GraphQL API, built-in text/image vectorizers, and multi-modal support.

Schema-first: define classes with properties and vectorizer config. Weaviate handles embedding generation if you configure a module.

Hybrid search: built-in BM25 + vector fusion with a single API call. Alpha parameter controls dense vs sparse weight.

Multi-tenancy: built-in tenant isolation — each tenant gets its own HNSW index shard. Critical for SaaS applications.

Example: Weaviate Python Client with Hybrid Search

import weaviate client = weaviate.connect_to_local() collection = client.collections.get("Document") # Hybrid search (BM25 + vector) results = collection.query.hybrid( query="transformer attention mechanism", alpha=0.7, # 0 = pure BM25, 1 = pure vector limit=10, filters=weaviate.classes.query.Filter.by_property("category").equal("ml") ) for obj in results.objects: print(obj.properties["content"][:100])
05 — Managed, Serverless

Pinecone: Managed, Serverless

Pinecone: fully managed, serverless vector DB. No infrastructure to operate. Scales automatically.

Pods vs Serverless: pods = dedicated compute (predictable latency), serverless = pay per query (scales to zero)

Strong at: multi-tenancy via namespaces, metadata filtering, hybrid search (sparse + dense)

Limitations: no SQL joins, no self-hosting, more expensive at high query volumes vs self-hosted

Example: Pinecone Serverless

from pinecone import Pinecone, ServerlessSpec pc = Pinecone(api_key="YOUR_API_KEY") # Create serverless index pc.create_index("my-index", dimension=1536, metric="cosine", spec=ServerlessSpec(cloud="aws", region="us-east-1")) index = pc.Index("my-index") # Upsert vectors with metadata index.upsert(vectors=[ {"id": "doc_1", "values": embedding, "metadata": {"category": "tech", "date": "2024-01"}} ], namespace="tenant_123") # namespace for multi-tenancy # Query with filter results = index.query(vector=query_embedding, top_k=10, filter={"category": {"$eq": "tech"}}, namespace="tenant_123")
06 — Developer-first

Chroma: Developer-First

Chroma: Python-native, embeddable vector DB. Runs in-process (no server) or as a server. Built for rapid prototyping and small-to-medium scale.

Built-in embedding: configure an embedding function once; Chroma calls it automatically on insert and query

Persistent or in-memory: ephemeral for testing, persistent SQLite/DuckDB for development, server mode for production

Example: Chroma Integration

import chromadb from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction client = chromadb.PersistentClient(path="./chroma_db") ef = OpenAIEmbeddingFunction(model_name="text-embedding-3-small") collection = client.get_or_create_collection("docs", embedding_function=ef) # Add documents — Chroma handles embedding collection.add(documents=["RAG is retrieval-augmented generation...", "Transformers use attention mechanisms..."], ids=["doc1", "doc2"], metadatas=[{"topic": "rag"}, {"topic": "architecture"}]) # Query — Chroma handles query embedding results = collection.query(query_texts=["how does RAG work?"], n_results=3, where={"topic": "rag"})
07 — Decision Guide

Choosing and Operating a Vector DB

Vector DB Decision Guide

SituationRecommendation
Already on Postgres, <5M vectorspgvector
Need complex filtering at scaleQdrant
Multi-tenant SaaS productWeaviate or Pinecone (namespaces)
No infra team, serverless preferredPinecone Serverless
Prototyping / developmentChroma
>100M vectors, on-premMilvus or Qdrant distributed

Operational Patterns

Index warm-up: HNSW requires a query to be fast after build. Hot/cold segmentation: recent vectors in hot index, old in cold. Embedding cache: avoid re-embedding identical texts.

Monitoring

Track query latency (P50/P95/P99), recall (run ground-truth queries regularly), index size growth, failed queries

Tools Grid

Postgres Extension
pgvector
HNSW + IVFFlat indices, SQL joins, ACID
Rust VectorDB
Qdrant
Filtered ANN, sparse vectors, multi-vector
GraphQL VectorDB
Weaviate
Schema-first, hybrid search, multi-tenancy
Managed Serverless
Pinecone
No ops, auto-scaling, namespaces
Python Embeddable
Chroma
In-process or server, auto-embedding
Distributed VectorDB
Milvus
Kubernetes-native, petabyte scale
Further Reading

References

Documentation
Benchmarks & Research