Concept Articles

GenAI Deep Dives

In-depth articles on the most complex topics β€” comparison tables, Python code examples, method cards, and references.

Filter:
Inference & Architecture
Training & Fine-Tuning
Retrieval & RAG
Text Embeddings
Embedding models, MTEB benchmarks, Matryoshka dimensions, fine-tuning, cross-encoder reranking
Read β†’
Retrieval Technology
BM25, FAISS, HNSW, hybrid RRF β€” how to pick and tune a retrieval stack
Read β†’
Advanced RAG Patterns
HyDE, reranking, hybrid search, multi-hop retrieval, and RAGAS evaluation
Read β†’
Vector Databases
pgvector, Qdrant, Weaviate, Pinecone, Chroma β€” comparison and selection guide
Read β†’
Golden Datasets
Building eval datasets, quality criteria, annotation pipelines, and versioning
Read β†’
Chunking Strategies
Fixed, recursive, semantic, and hierarchical chunking β€” how splitting shapes retrieval quality
Read β†’
Data Ingestion Pipelines
PDF, HTML, database, and API ingestion β€” building reliable document processing at scale
Read β†’
Post-Retrieval Processing
Reranking, context compression, RRF fusion, and context window management
Read β†’
Unstructured.io Parsing
Partition functions, element types, chunking strategies, cloud vs local β€” turning documents into RAG-ready chunks
Read β†’
Docling Document Conversion
IBM's structured document converter β€” layout analysis, TableFormer table extraction, Markdown export for RAG
Read β†’
GraphRAG
Microsoft's knowledge graph approach β€” entity extraction, community detection, global + local query modes
Read β†’
Contextual Retrieval
Anthropic's technique β€” prepend chunk-specific context to cut retrieval failures by 49%
Read β†’
Agentic RAG
Multi-step retrieval, CRAG, Self-RAG, query decomposition β€” RAG with planning and self-correction
Read β†’
Agents & Orchestration
Models & Prompting
Safety, Evaluation & Governance
Production & Infrastructure
MLOps for LLMs
Prompt CI/CD, model registry, drift detection, canary deploys, and cost management
Read β†’
AI Hardware Guide
H100 vs A100 vs MI300X, cloud vs on-prem, memory math, and interconnects
Read β†’
LLM Monitoring
Traces, quality scoring, cost tracking, and drift detection in production
Read β†’
Cloud Deployment
AWS, GCP, Azure β€” managed APIs vs self-hosted, auto-scaling, and cost optimisation
Read β†’
State & Session Management
Conversation history, Redis stores, multi-user isolation, stateful agents
Read β†’
LLM Dev Frameworks
LangChain, LlamaIndex, Haystack, Semantic Kernel β€” selecting the right framework for production
Read β†’
Data Governance for AI
PII handling, GDPR compliance, data lineage, consent management, and quality controls
Read β†’
LLM Traffic & Cost Management
Token budgets, prompt caching, semantic caching, model routing, and budget alerts in production
Read β†’
AI Architecture Decision Frameworks
Build vs buy, RAG vs fine-tune, in-context learning β€” structured frameworks for architecture decisions
Read β†’
vLLM
PagedAttention, continuous batching, OpenAI-compatible serving β€” 24Γ— throughput improvement
Read β†’
LiteLLM
Unified interface to 100+ LLM providers β€” routing, fallbacks, cost tracking, and proxy server
Read β†’
LLM Streaming
SSE, async generators, FastAPI StreamingResponse β€” token-by-token delivery end-to-end
Read β†’
Cost–Quality–Speed Triangle
Model tiering, LLM cascade, caching strategies β€” navigating the iron triangle in production
Read β†’
Multimodal
Foundations

No articles match ""

Try a different keyword or clear the search.

← Back to home Open the mindmap β†’