01 — Definition
What Is a Vector Database?
A vector database stores high-dimensional embedding vectors alongside metadata, and provides approximate nearest neighbor (ANN) search — find the top-k vectors most similar to a query vector.
Compared to traditional DBs: regular DBs filter/sort on exact values. Vector DBs find "semantically similar" items via geometric distance.
Core Operations
Insert: store vector + metadata + optional text. Query: find top-k nearest by cosine/dot/L2. Filter: restrict search by metadata. Delete/Update: remove or modify stored vectors.
⚠️
A vector database is NOT just a vector search library. Production vector DBs add: persistence, horizontal scaling, real-time insert/delete, filtered search, multi-tenancy, backups, and access control.
02 — Postgres-native
pgvector: Postgres-Native
pgvector: open-source Postgres extension. Adds vector column type and HNSW/IVFFlat indices.
Why use it: you already have Postgres, SQL joins across vector and relational data, ACID transactions, existing auth/backup infrastructure.
Supports: cosine, L2, inner product similarity. HNSW index (v0.5+) for fast ANN. IVFFlat for memory efficiency.
Example: pgvector Setup and Query
-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table with vector column
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536), -- OpenAI text-embedding-3-small
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create HNSW index for fast ANN
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 200);
-- Semantic search with metadata filter
SELECT content, metadata,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE metadata->>'category' = 'finance' -- filter first
AND created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> $1::vector
LIMIT 10;
✓
For most teams already on Postgres, pgvector is the right default up to ~5M vectors. The query planner, connection pooling (PgBouncer), and monitoring you already have apply directly.
03 — Filterable ANN
Qdrant: Filterable ANN
Qdrant: Rust-based vector DB, designed from scratch for filtered ANN search. Open-source + managed cloud.
Key differentiator: "payload filtering" is first-class — doesn't fall back to post-filtering (which kills recall). Maintains separate HNSW graphs per filter combination.
Supports: dense vectors, sparse vectors (BM25), multi-vectors (ColBERT). Named vectors for multi-embedding-per-document.
Example: Qdrant Python Client
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue
client = QdrantClient(url="http://localhost:6333")
# Create collection
client.create_collection("documents", vectors_config=VectorParams(
size=1536, distance=Distance.COSINE
))
# Insert with payload
client.upsert("documents", points=[
PointStruct(id=1, vector=embedding,
payload={"category": "finance", "date": "2024-01-15"})
])
# Filtered ANN search
results = client.search(
"documents",
query_vector=query_embedding,
query_filter=Filter(must=[
FieldCondition(key="category", match=MatchValue(value="finance"))
]),
limit=10
)
Qdrant vs pgvector for Filtered Search
| Scenario | pgvector | Qdrant |
| Low cardinality filter (2 categories) | Good | Excellent |
| High cardinality filter (1000 users) | Degrades | Maintains recall |
| Complex nested filters | SQL | Payload filter JSON |
| Pre-filter selectivity <1% | ANN recall drops | Optimized |
04 — Schema + Search
Weaviate: Schema + Search
Weaviate: open-source vector DB with GraphQL API, built-in text/image vectorizers, and multi-modal support.
Schema-first: define classes with properties and vectorizer config. Weaviate handles embedding generation if you configure a module.
Hybrid search: built-in BM25 + vector fusion with a single API call. Alpha parameter controls dense vs sparse weight.
Multi-tenancy: built-in tenant isolation — each tenant gets its own HNSW index shard. Critical for SaaS applications.
Example: Weaviate Python Client with Hybrid Search
import weaviate
client = weaviate.connect_to_local()
collection = client.collections.get("Document")
# Hybrid search (BM25 + vector)
results = collection.query.hybrid(
query="transformer attention mechanism",
alpha=0.7, # 0 = pure BM25, 1 = pure vector
limit=10,
filters=weaviate.classes.query.Filter.by_property("category").equal("ml")
)
for obj in results.objects:
print(obj.properties["content"][:100])
05 — Managed, Serverless
Pinecone: Managed, Serverless
Pinecone: fully managed, serverless vector DB. No infrastructure to operate. Scales automatically.
Pods vs Serverless: pods = dedicated compute (predictable latency), serverless = pay per query (scales to zero)
Strong at: multi-tenancy via namespaces, metadata filtering, hybrid search (sparse + dense)
Limitations: no SQL joins, no self-hosting, more expensive at high query volumes vs self-hosted
Example: Pinecone Serverless
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="YOUR_API_KEY")
# Create serverless index
pc.create_index("my-index", dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"))
index = pc.Index("my-index")
# Upsert vectors with metadata
index.upsert(vectors=[
{"id": "doc_1", "values": embedding, "metadata": {"category": "tech", "date": "2024-01"}}
], namespace="tenant_123") # namespace for multi-tenancy
# Query with filter
results = index.query(vector=query_embedding, top_k=10,
filter={"category": {"$eq": "tech"}},
namespace="tenant_123")
06 — Developer-first
Chroma: Developer-First
Chroma: Python-native, embeddable vector DB. Runs in-process (no server) or as a server. Built for rapid prototyping and small-to-medium scale.
Built-in embedding: configure an embedding function once; Chroma calls it automatically on insert and query
Persistent or in-memory: ephemeral for testing, persistent SQLite/DuckDB for development, server mode for production
Example: Chroma Integration
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
client = chromadb.PersistentClient(path="./chroma_db")
ef = OpenAIEmbeddingFunction(model_name="text-embedding-3-small")
collection = client.get_or_create_collection("docs", embedding_function=ef)
# Add documents — Chroma handles embedding
collection.add(documents=["RAG is retrieval-augmented generation...",
"Transformers use attention mechanisms..."],
ids=["doc1", "doc2"],
metadatas=[{"topic": "rag"}, {"topic": "architecture"}])
# Query — Chroma handles query embedding
results = collection.query(query_texts=["how does RAG work?"], n_results=3,
where={"topic": "rag"})
07 — Decision Guide
Choosing and Operating a Vector DB
Vector DB Decision Guide
| Situation | Recommendation |
| Already on Postgres, <5M vectors | pgvector |
| Need complex filtering at scale | Qdrant |
| Multi-tenant SaaS product | Weaviate or Pinecone (namespaces) |
| No infra team, serverless preferred | Pinecone Serverless |
| Prototyping / development | Chroma |
| >100M vectors, on-prem | Milvus or Qdrant distributed |
Operational Patterns
Index warm-up: HNSW requires a query to be fast after build. Hot/cold segmentation: recent vectors in hot index, old in cold. Embedding cache: avoid re-embedding identical texts.
Monitoring
Track query latency (P50/P95/P99), recall (run ground-truth queries regularly), index size growth, failed queries
Tools Grid
Further Reading
References
Documentation
Benchmarks & Research