Concept Articles

GenAI Deep Dives

In-depth articles on the most complex topics — comparison tables, Python code examples, method cards, and references.

Inference & Architecture

KV Cache Mechanics

How transformers avoid recomputing attention — memory math, GQA/MLA, eviction strategies, PagedAttention, prefix caching

Model Quantization

GPTQ, AWQ, SmoothQuant, GGUF — reducing model size 2–8× with controlled accuracy loss

Transformer Architecture

MHA, GQA, FFN/SwiGLU, RoPE, pre-norm — the building blocks of every modern LLM

Tokenization, context windows, sampling strategies, logprobs, and chat templates

Advanced Reasoning

CoT, process reward models, tree search, GRPO, and test-time compute scaling

TTFT/TBT, continuous batching, speculative decoding, parallelism, and latency SLOs

Attention Mechanisms

Self-attention, MHA, GQA, Flash Attention, cross-attention — the core of every transformer

Positional Encoding

Sinusoidal, RoPE, ALiBi, YaRN — how transformers know token order and extend context

Mixture of Experts

Sparse activation, top-K routing, Mixtral 8x7B — compute-efficient scaling with expert FFN layers

Flash Attention

IO-aware tiled attention — O(N) memory, 2–4× speedup, and the foundation for long-context training

BPE, SentencePiece, tiktoken — how LLMs convert text to integers and why it matters for cost

Training & Fine-Tuning

LoRA, QLoRA, prefix tuning, adapters — rank selection and memory comparison

RLHF, DPO, Constitutional AI — how each shapes model behaviour and what it costs

Training Techniques

Mixed precision, FSDP, DeepSpeed ZeRO, tensor and pipeline parallelism

Self-instruct, distillation, rejection sampling, Magpie — and avoiding model collapse

Fine-Tuning Tools

Axolotl, Unsloth, LlamaFactory, TRL — framework comparison with config examples

Data-Centric AI

Quality over quantity — curation pipelines, deduplication, and annotation best practices

Human annotation, weak supervision, LLM-assisted labeling — quality control and tooling

Data Preparation

Format conversion, filtering, deduplication, and dataset mixing for LLM fine-tuning

SFT → reward model → PPO — the three-phase pipeline that powers ChatGPT-style alignment

Direct Preference Optimization

DPO loss derivation, dataset formats, TRL training, and variants (IPO, KTO, ORPO)

Low-rank adaptation math, 4-bit QLoRA, PEFT training with TRL, and adapter merging

Retrieval & RAG

Text Embeddings

Embedding models, MTEB benchmarks, Matryoshka dimensions, fine-tuning, cross-encoder reranking

Retrieval Technology

BM25, FAISS, HNSW, hybrid RRF — how to pick and tune a retrieval stack

Advanced RAG Patterns

HyDE, reranking, hybrid search, multi-hop retrieval, and RAGAS evaluation

Vector Databases

pgvector, Qdrant, Weaviate, Pinecone, Chroma — comparison and selection guide

Golden Datasets

Building eval datasets, quality criteria, annotation pipelines, and versioning

Chunking Strategies

Fixed, recursive, semantic, and hierarchical chunking — how splitting shapes retrieval quality

Data Ingestion Pipelines

PDF, HTML, database, and API ingestion — building reliable document processing at scale

Post-Retrieval Processing

Reranking, context compression, RRF fusion, and context window management

Unstructured.io Parsing

Partition functions, element types, chunking strategies, cloud vs local — turning documents into RAG-ready chunks

Docling Document Conversion

IBM's structured document converter — layout analysis, TableFormer table extraction, Markdown export for RAG

Microsoft's knowledge graph approach — entity extraction, community detection, global + local query modes

Contextual Retrieval

Anthropic's technique — prepend chunk-specific context to cut retrieval failures by 49%

Multi-step retrieval, CRAG, Self-RAG, query decomposition — RAG with planning and self-correction

Agents & Orchestration

Agent Frameworks

LangGraph, CrewAI, AutoGen, Pydantic AI — comparison and decision guide

Model Context Protocol

Anthropic's open standard for LLM–tool integration — resources, tools, prompts, and ecosystem

LCEL, chains, retrievers, agents, and LangSmith — composable LLM application framework

Graph-based agent orchestration with cycles, persistence, and human-in-the-loop support

ReAct, plan-and-execute, Reflexion, Tree of Thoughts, and memory patterns

Agent Memory Systems

Working, episodic, semantic, and procedural memory — MemGPT and production patterns

Multi-Agent Systems

Orchestrator–subagent, parallel execution, debate, trust levels, and observability

Tool Use & Function Calling

The tool loop, schema design, parallel calls, strict mode, and failure modes

Compound AI Systems

DSPy, LMQL, verifier chains, router patterns — beyond single-model calls

Integration Standards

MCP, OpenAI tool spec, A2A — the emerging protocols that let agents and tools interoperate

Agent Execution Models

Event-driven, DAG workflows, stepwise loops, human-in-the-loop, and checkpointing patterns

Models & Prompting

Frontier LLM Models

GPT-4o, Claude, Gemini, Llama 3 — capabilities, pricing, and selection guide

Prompt Engineering

System prompts, few-shot, chain-of-thought, role prompting, and format control

Programmatic Prompting

DSPy, LCEL, MIPRO, OPRO — optimising prompts automatically with labelled examples

JSON mode, instructor, Outlines, constrained decoding — reliable structured output

Vision-Language Models

CLIP, LLaVA, GPT-4V, tiled tokenization — multimodal architecture and pipelines

Open Source LLM Models

Llama 3, Mistral, Gemma, Phi, Qwen — benchmarks, licences, and deployment guide

Safety, Evaluation & Governance

LLM Safety Techniques

Constitutional AI, red-teaming, guardrails, jailbreak defences, and privacy

Evaluation in Practice

Eval funnel, LLM-as-judge, golden sets, CI/CD for evals, framework comparison

LLM Benchmarks Explained

MMLU, SWE-Bench, Chatbot Arena — what they measure, where they mislead

Reliability Engineering

Hallucination mitigation, grounding, uncertainty quantification, fallback patterns

RAGAS, TruLens, DeepEval — measuring retrieval quality and answer faithfulness

Human Oversight Mechanisms

HITL patterns, approval gates, escalation logic, and accountability frameworks for AI systems

Frontier AI Implications

AGI timelines, societal risks, dual-use concerns, and governance frameworks for transformative AI

Prompt Injection Attacks

OWASP #1 LLM risk — direct/indirect injection, agent attacks, and defense-in-depth strategies

Red Teaming LLMs

Human + automated adversarial probing — HarmBench, GCG, PAIR, and the full red team pipeline

Scalable eval with LLM judges — G-Eval, MT-Bench, bias mitigation, and calibration

Production & Infrastructure

Prompt CI/CD, model registry, drift detection, canary deploys, and cost management

AI Hardware Guide

H100 vs A100 vs MI300X, cloud vs on-prem, memory math, and interconnects

Traces, quality scoring, cost tracking, and drift detection in production

Cloud Deployment

AWS, GCP, Azure — managed APIs vs self-hosted, auto-scaling, and cost optimisation

State & Session Management

Conversation history, Redis stores, multi-user isolation, stateful agents

LLM Dev Frameworks

LangChain, LlamaIndex, Haystack, Semantic Kernel — selecting the right framework for production

Data Governance for AI

PII handling, GDPR compliance, data lineage, consent management, and quality controls

LLM Traffic & Cost Management

Token budgets, prompt caching, semantic caching, model routing, and budget alerts in production

AI Architecture Decision Frameworks

Build vs buy, RAG vs fine-tune, in-context learning — structured frameworks for architecture decisions

PagedAttention, continuous batching, OpenAI-compatible serving — 24× throughput improvement

Unified interface to 100+ LLM providers — routing, fallbacks, cost tracking, and proxy server

SSE, async generators, FastAPI StreamingResponse — token-by-token delivery end-to-end

Cost–Quality–Speed Triangle

Model tiering, LLM cascade, caching strategies — navigating the iron triangle in production

Multimodal

Image Generation Models

Diffusion models, DALL-E 3, Flux — architecture, sampling algorithms, LoRA fine-tuning, FID/CLIP metrics

Audio & Speech Models

Whisper, Wav2Vec, MusicGen, Bark — ASR, TTS, and audio generation architectures

Video AI Models

Sora, CogVideoX, VideoLLaMA — generation, understanding, and temporal modelling

Foundations

Neural Networks Fundamentals

Perceptrons, backprop, activation functions, LayerNorm, and training stability

Optimisation Algorithms

SGD, Adam, AdamW, Adafactor — adaptive learning rates, weight decay, and LR scheduling

Math Foundations for ML

Linear algebra, calculus, probability, and statistics — the mathematical bedrock of every ML model

Python ML Ecosystem

NumPy, pandas, scikit-learn, HuggingFace Transformers — the core Python stack for ML practitioners

PyTorch Fundamentals

Tensors, autograd, nn.Module, training loops, DataLoaders, and GPU acceleration

Regularization Techniques

Dropout, weight decay, L1/L2, batch/layer norm, early stopping — preventing overfitting in deep learning

Kaplan, Chinchilla, and inference-time scaling — power-law relationships that govern LLM training

No articles match ""

Try a different keyword or clear the search.

← Back to home Open the mindmap →