Vector Databases

Pinecone

A fully managed vector database service built for production-scale semantic search and RAG, with serverless and pod-based deployment options.

Serverless
No infra to manage
10M+ vectors
Free tier
Sub-10ms
P99 latency

Table of Contents

SECTION 01

The managed database tradeoff

Self-hosting a vector database (Qdrant, Weaviate, Milvus) gives you full control and zero per-query cost, but you own the infrastructure: provisioning, scaling, backups, upgrades, and on-call. Pinecone is the opposite bet — you pay per vector and per query, but Anthropic operations are someone else's problem.

Pinecone makes sense when: your team is small, time-to-production matters, and your query volume is predictable. At very high query volumes, the per-query cost can exceed the cost of self-hosting — do the math for your workload.

SECTION 02

Pinecone architecture

A Pinecone index stores vectors with associated metadata. Each vector has a unique ID (a string you choose), a values array (the embedding), and an optional metadata dict.

Pinecone uses a hierarchical navigable small world (HNSW) or IVF index under the hood for approximate nearest-neighbour search. You configure the metric (cosine, dot product, euclidean) at index creation time — choose cosine for normalised embeddings, dot product if you're using magnitude as a relevance signal.

SECTION 03

Setup and first index

pip install pinecone-client
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

# Create a serverless index (free tier: 1 index, 10M vectors)
pc.create_index(
    name="my-rag-index",
    dimension=1536,           # must match your embedding model's output dims
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

# Connect to the index
index = pc.Index("my-rag-index")
print(index.describe_index_stats())
SECTION 04

Upsert and query

from openai import OpenAI
from pinecone import Pinecone

client = OpenAI()
pc = Pinecone(api_key="your-api-key")
index = pc.Index("my-rag-index")

def embed(text: str) -> list[float]:
    return client.embeddings.create(
        input=[text], model="text-embedding-3-small"
    ).data[0].embedding

# Ingest documents
docs = [
    {"id": "doc-1", "text": "Refunds are processed within 5 business days.", "source": "faq"},
    {"id": "doc-2", "text": "Shipping is free on orders over $50.", "source": "faq"},
    {"id": "doc-3", "text": "Our CEO is Jane Smith.", "source": "about"},
]

vectors = [
    {
        "id": doc["id"],
        "values": embed(doc["text"]),
        "metadata": {"text": doc["text"], "source": doc["source"]}
    }
    for doc in docs
]
index.upsert(vectors=vectors)

# Query
query_emb = embed("How long do refunds take?")
results = index.query(
    vector=query_emb,
    top_k=3,
    include_metadata=True
)
for match in results.matches:
    print(f"Score {match.score:.3f}: {match.metadata['text']}")
SECTION 05

Metadata filtering

Pinecone supports filtering on metadata fields to narrow the search space before ANN — crucial for multi-tenant apps and category-scoped search:

# Only search within the "faq" source
results = index.query(
    vector=query_emb,
    top_k=5,
    include_metadata=True,
    filter={"source": {"$eq": "faq"}}
)

# More complex filter: source is faq AND doc_type is not "archived"
results = index.query(
    vector=query_emb,
    top_k=5,
    filter={
        "$and": [
            {"source": {"$eq": "faq"}},
            {"doc_type": {"$ne": "archived"}}
        ]
    },
    include_metadata=True
)

Supported operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $and, $or.

SECTION 06

Namespaces for multi-tenancy

Namespaces partition an index so different tenants can't see each other's data, without the cost of separate indexes:

# Ingest into a tenant-specific namespace
index.upsert(
    vectors=[{"id": "doc-1", "values": embed("..."), "metadata": {...}}],
    namespace="tenant-abc"
)

# Query only within that tenant's namespace
results = index.query(
    vector=query_emb,
    top_k=5,
    namespace="tenant-abc",
    include_metadata=True
)

# Delete all data for a tenant (GDPR compliance)
index.delete(delete_all=True, namespace="tenant-abc")
SECTION 07

Gotchas

Dimension is permanent. You choose the vector dimension at index creation and cannot change it. If you switch embedding models, you must recreate the index and re-embed everything.

Metadata is not free. Large metadata dicts increase storage costs and slow down filtering. Store only the fields you'll actually filter on; keep the full document text in a separate store (Postgres, S3).

Serverless cold starts. Serverless indexes that haven't been queried recently may have a brief cold-start latency spike on the first query after inactivity. For SLA-sensitive endpoints, use pod-based indexes or keep a warm-up pinger.

Eventual consistency. Upserted vectors are not immediately queryable — there's a short propagation delay (usually <1 second). Don't assume a vector is searchable immediately after upsert in time-sensitive tests.

Serverless Architecture & Scaling

Pinecone automatically scales to handle variable workloads without manual infrastructure management. The serverless approach eliminates the need to provision and manage database instances, dramatically simplifying operational concerns. As query volume or dataset size changes, Pinecone transparently adjusts underlying resources. This design frees developers to focus on application logic rather than database administration, reducing operational overhead and improving time-to-market.

The automatic scaling enables predictable pricing models where costs scale with actual usage rather than reserved capacity. For applications with variable traffic patterns, this can significantly reduce infrastructure costs compared to self-managed vector databases. The trade-off involves slightly higher latency in some cases and less direct control over infrastructure tuning, but for most applications these are acceptable.

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")

# Pinecone auto-scales transparently
index = pc.Index("documents")

# Upsert automatically triggers scaling if needed
index.upsert(vectors=[
    ("id-1", [0.1, 0.2, ...], {"source": "doc1"}),
    ("id-2", [0.3, 0.4, ...], {"source": "doc2"}),
])

# Query scaling adapts to load
results = index.query(
    vector=[0.2, 0.3, ...],
    top_k=10,
    filter={"source": "doc1"}
)
ComponentFunctionConfiguration
Vector IndexStores and searches vectorsDimension, metric, pods
MetadataRich filtering capabilityCustom attributes
NamespacesLogical data isolationPer-tenant separation
ReplicationHigh availabilityReplica count

Serverless vector databases like Pinecone represent the future of vector infrastructure. As RAG applications proliferate, the shift toward managed services reduces friction and enables faster development cycles. Organizations can focus on their core ML and application logic rather than database infrastructure concerns.

Metadata filtering in Pinecone enables precise control over search results beyond simple similarity matching. Using rich metadata—timestamps, user IDs, document sources, and custom attributes—you can construct complex filters that narrow results to relevant subsets. This flexibility supports sophisticated retrieval patterns including date-range filtering, user-scoped searches, and source-specific queries. The combination of vector similarity with metadata filtering creates powerful retrieval capabilities.

Namespace functionality in Pinecone enables multi-tenancy at scale. Rather than managing separate indexes for different users or projects, namespaces provide logical isolation within a single index. This design efficiently handles scenarios where different users or applications require isolated data within shared infrastructure. The cost and operational efficiency of namespace-based isolation makes it ideal for SaaS applications and multi-tenant deployments.

Pinecone's infrastructure handles failover and disaster recovery automatically. High availability through replication ensures your vector data remains accessible even during component failures. This reliability is crucial for production systems where downtime directly impacts users. The managed service approach handles these concerns transparently, freeing developers to focus on application logic.

Performance optimization in Pinecone involves understanding query patterns and data distribution. While the serverless architecture handles most performance concerns automatically, basic understanding of indexing strategies and query structure helps achieve optimal results. Pinecone's documentation provides guidance on these concerns, supporting efficient deployment even for applications with demanding performance requirements.

Real-time indexing in Pinecone enables immediate availability of new data after upserts. Unlike batch-based systems requiring periodic reindexing, Pinecone indexes data immediately upon insertion. This real-time capability supports dynamic applications where newly added information must be searchable without delay. The operational simplicity of real-time indexing reduces complexity compared to systems requiring scheduled reindexing jobs.

Hybrid search in Pinecone combines vector similarity with keyword matching, leveraging the strengths of both approaches. Vector search excels at semantic understanding while keyword search handles exact matches. The combination captures benefits of both modalities, improving search quality for many applications. Understanding when and how to use hybrid search effectively enhances retrieval system quality.

Cost considerations for Pinecone scales with both storage volume and query throughput. Understanding your cost drivers—storage, queries, compute—enables optimization. Some applications benefit from aggressive pruning of old data, while others benefit from strategic caching of frequent queries. Data-driven approaches to cost management help keep Pinecone deployment economical as systems scale.

Pinecone's roadmap continues adding capabilities requested by the community. Recent additions include support for sparse vectors, enabling hybrid dense-sparse retrieval. This evolution ensures Pinecone remains at the cutting edge of vector search technology, supporting increasingly sophisticated retrieval scenarios as applications become more complex.

Pinecone represents the future of vector database infrastructure for modern AI applications.

Adopting Pinecone as your vector store simplifies the operational complexity of managing vector search infrastructure. The managed service approach and transparent pricing enable teams to focus on application value rather than database administration. As vector search becomes increasingly important in AI applications, platforms like Pinecone provide essential infrastructure supporting innovation in retrieval-augmented generation and semantic search applications.