pgvector

Why keep vectors in Postgres
Installation and setup
Creating and querying vector columns
Index types: IVFFlat vs HNSW
Hybrid search with full-text + vectors
pgvector with SQLAlchemy
Gotchas

SECTION 01

Why keep vectors in Postgres

The best database is often the one you already have. If your application already uses PostgreSQL for user data, documents, and metadata, adding vectors to the same database means: no extra service to operate, no data synchronisation between databases, joins between vector results and relational data, and full ACID transactions across both.

The limitation: pgvector is not as fast as dedicated vector databases at extreme scale (>10M vectors with high QPS). But for most applications — millions of documents, tens of thousands of queries per day — pgvector handles it comfortably.

SECTION 02

Installation and setup

# Using Docker (easiest)
docker run -e POSTGRES_PASSWORD=password -p 5432:5432 pgvector/pgvector:pg16

# Or install extension in existing Postgres 12+
# sudo apt install postgresql-16-pgvector
# Then in psql: CREATE EXTENSION vector;

pip install psycopg2-binary pgvector

import psycopg2
from pgvector.psycopg2 import register_vector

conn = psycopg2.connect("postgresql://postgres:password@localhost/mydb")
register_vector(conn)

cur = conn.cursor()
cur.execute("CREATE EXTENSION IF NOT EXISTS vector")
conn.commit()

SECTION 03

Creating and querying vector columns

from pgvector.psycopg2 import register_vector
import psycopg2
from openai import OpenAI

conn = psycopg2.connect("postgresql://postgres:password@localhost/mydb")
register_vector(conn)
cur = conn.cursor()

oai = OpenAI()
def embed(text): return oai.embeddings.create(input=[text], model="text-embedding-3-small").data[0].embedding

# Create table with vector column
cur.execute('''
    CREATE TABLE IF NOT EXISTS documents (
        id        SERIAL PRIMARY KEY,
        content   TEXT NOT NULL,
        category  TEXT,
        embedding vector(1536)
    )
''')
conn.commit()

# Insert documents
docs = [
    ("Refunds processed in 5 business days.", "policy"),
    ("Free shipping on orders over $50.", "shipping"),
]
for text, cat in docs:
    cur.execute("INSERT INTO documents (content, category, embedding) VALUES (%s, %s, %s)",
                (text, cat, embed(text)))
conn.commit()

# Cosine similarity search (operator: <=>)
query_emb = embed("What is the refund timeline?")
cur.execute('''
    SELECT content, 1 - (embedding <=> %s) AS similarity
    FROM documents
    ORDER BY embedding <=> %s
    LIMIT 3
''', (query_emb, query_emb))
for row in cur.fetchall():
    print(f"Sim {row[1]:.3f}: {row[0]}")

Distance operators: <=> = cosine, <-> = L2 (Euclidean), <#> = inner product (negative).

SECTION 04

Index types: IVFFlat vs HNSW

# IVFFlat — good for static datasets, faster to build
cur.execute('''
    CREATE INDEX ON documents
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100)
''')
# Rule of thumb for lists: sqrt(num_rows) for up to 1M rows, rows/1000 for larger

# HNSW — better recall, faster queries, larger build time (pgvector >=0.5.0)
cur.execute('''
    CREATE INDEX ON documents
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64)
''')
# m: connections per node (16 is default); ef_construction: build accuracy
conn.commit()

HNSW is the default choice for new indexes — it has better recall at the same query speed. IVFFlat requires fewer resources to build and is better when you have a huge dataset and want to trade some recall for build time.

SECTION 05

Hybrid search with full-text + vectors

pgvector's killer feature: combine BM25 full-text search with vector similarity in a single SQL query:

cur.execute('''
    SELECT
        content,
        ts_rank(to_tsvector('english', content), query) AS bm25_score,
        1 - (embedding <=> %s) AS vector_score,
        -- Reciprocal Rank Fusion
        1.0 / (60 + RANK() OVER (ORDER BY embedding <=> %s)) +
        1.0 / (60 + RANK() OVER (ORDER BY ts_rank(to_tsvector('english', content), query) DESC)) AS rrf_score
    FROM
        documents,
        plainto_tsquery('english', %s) AS query
    WHERE
        to_tsvector('english', content) @@ query
        OR (embedding <=> %s) < 0.5
    ORDER BY rrf_score DESC
    LIMIT 5
''', (query_emb, query_emb, query_text, query_emb))

SECTION 06

pgvector with SQLAlchemy

from sqlalchemy import create_engine, Column, Integer, Text
from sqlalchemy.orm import DeclarativeBase, Session
from pgvector.sqlalchemy import Vector

engine = create_engine("postgresql+psycopg2://postgres:password@localhost/mydb")

class Base(DeclarativeBase): pass

class Document(Base):
    __tablename__ = "documents"
    id        = Column(Integer, primary_key=True)
    content   = Column(Text, nullable=False)
    embedding = Column(Vector(1536))

Base.metadata.create_all(engine)

# Insert
with Session(engine) as session:
    doc = Document(content="Refunds take 5 days.", embedding=embed("Refunds take 5 days."))
    session.add(doc)
    session.commit()

# Query
from sqlalchemy import select
with Session(engine) as session:
    results = session.scalars(
        select(Document)
        .order_by(Document.embedding.cosine_distance(query_emb))
        .limit(5)
    ).all()

SECTION 07

Gotchas

Index must be built before it's used. Adding an HNSW index to an existing large table can take minutes to hours. Plan index creation during low-traffic windows, and use CREATE INDEX CONCURRENTLY in production.

Exact search without index. Without an index, every query does a full sequential scan. For small tables (<10k rows) this is fine; for larger tables, always create an index.

HNSW recall vs ef_search. At query time, SET hnsw.ef_search = 100 (default 40) to increase recall at the cost of latency. Tune this based on your recall requirements.

Max vector dimensions is 16,000. More than sufficient for current embedding models, but worth knowing.

Index selection and performance tuning

Choosing between IVFFlat and HNSW indexes in pgvector depends on the performance requirements and dataset size. IVFFlat partitions vectors into lists and searches only the nearest lists, providing faster index build times and lower memory usage but requiring careful tuning of the lists and probes parameters. HNSW builds a hierarchical graph index that provides better recall at comparable query latency but uses 2–3x more memory and takes longer to build. For most production deployments serving online queries, HNSW is the preferred choice due to its consistently higher recall at low latency.

Index type	Build time	Memory	Recall quality	Best for
No index (exact)	N/A	Low	100% (exact)	<100K rows, exact search
IVFFlat	Fast	Medium	Medium (tunable)	Large datasets, build-time sensitive
HNSW	Slow	High	High	Production, low latency required

pgvector's HNSW implementation supports online inserts without rebuilding the index, which IVFFlat does not. This makes HNSW the only viable option for applications with continuous document ingestion where the vector index must remain available during updates. The m parameter (connections per node, default 16) and ef_construction parameter (search width during build, default 64) control the index quality-build-time tradeoff; increasing both to 32 and 128 respectively produces higher recall with a 3–4x longer build time.

Connection pooling is a critical operational consideration for pgvector deployments. Vector similarity queries are computationally more intensive than typical relational queries, holding database connections for longer than simple key lookups. PgBouncer in transaction mode (pooling connections at the transaction level) is the recommended pool configuration, as it handles the higher connection counts typical of horizontally scaled API services without requiring a proportionally large Postgres connection limit. The max_parallel_workers_per_gather Postgres parameter, which controls parallel query execution for vector scans, should be tuned based on available CPU cores to maximize throughput for concurrent vector queries.

pgvector's metadata filtering enables combined vector and relational queries that filter on structured attributes before or after vector search. The WHERE clause approach (filtering before vector search) reduces the effective search space but may produce lower recall if the filter is highly selective and eliminates most of the relevant vectors. The post-filter approach (retrieve top-k vectors, then apply the WHERE clause in application code) maintains retrieval recall but requires retrieving more candidates than needed to ensure sufficient results after filtering. For high-selectivity filters (returning less than 10% of the corpus), pre-filtering with a larger top-k parameter balances recall and query efficiency.

pg_embedding (the open-source extension from Neon that later merged with pgvector) and pgvector are now the same project, making pgvector the definitive community standard for Postgres vector search. Cloud providers including AWS RDS, Google Cloud SQL, Azure Database for PostgreSQL, and Supabase all support pgvector natively, eliminating the need for self-managed vector database infrastructure for teams already running Postgres. The availability of pgvector on managed Postgres services is the strongest argument for choosing pgvector over standalone vector databases for teams where operational simplicity and reduced infrastructure surface area are higher priorities than maximum throughput.

pgvector's support for partial indexes enables memory-efficient retrieval for multi-tenant applications where each tenant's vectors need to be searched independently. Creating a separate HNSW index per tenant using a WHERE clause filter (CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops) WHERE tenant_id = 'tenant_A') ensures that tenant data is never mixed during search and that the index size scales with per-tenant data volume rather than total data volume. This approach provides tenant isolation without requiring separate tables or databases per tenant, simplifying the schema while maintaining search efficiency for per-tenant queries.

pgvector's vector type supports multiple distance metrics: L2 distance (Euclidean), inner product, and cosine distance. The choice of distance metric should match the embedding model's training objective — models trained with cosine similarity loss (most sentence transformers) should use cosine distance; models trained to maximize inner product (some recommendation models) should use inner product. Using the wrong distance metric for an embedding model can silently degrade retrieval quality by 10–20 percentage points without triggering any errors, making distance metric documentation and validation a necessary step in pgvector integration.

pgvector performance degrades gracefully under high concurrent write load because HNSW index updates require exclusive locks on the modified portions of the index graph. Applications with high write/read ratios — real-time event streams, live document updates — may see query latency spikes during index update operations. Mitigation strategies include batching writes during low-traffic periods, using IVFFlat indexes (which support batch rebuild more efficiently than HNSW incremental updates), or maintaining a separate "staging" table with recent embeddings that is merged into the main index during off-peak maintenance windows.