Matplotlib / Seaborn

Why Visualize?
Loss & Metrics Curves
Attention Heatmaps
Embedding Visualization
Data Distribution Plots
Saving & Integration

SECTION 01

Why Visualize?

Numbers in a loss log don't tell you if training is healthy — a plot does. Visualization is how you catch diverging runs, class imbalance, and degenerate attention patterns before they waste GPU time.

Training curves: Spot overfitting (train loss drops, val loss rises) and divergence (NaN spikes)
Attention heatmaps: Verify the model actually attends to relevant tokens
Embedding plots (UMAP/t-SNE): Check that similar concepts cluster together
Data EDA: Catch length distribution, class imbalance, outliers before training

SECTION 02

Loss & Metrics Curves

import matplotlib.pyplot as plt import numpy as np # Simulated training history epochs = np.arange(1, 51) train_loss = 3.0 * np.exp(-0.1 * epochs) + np.random.randn(50) * 0.02 val_loss = 3.0 * np.exp(-0.08 * epochs) + np.random.randn(50) * 0.03 + 0.1 fig, axes = plt.subplots(1, 2, figsize=(12, 4)) # Loss curve axes[0].plot(epochs, train_loss, label="Train", color="#4f46e5") axes[0].plot(epochs, val_loss, label="Val", color="#f43f5e", linestyle="--") axes[0].set_xlabel("Epoch"); axes[0].set_ylabel("Loss") axes[0].set_title("Training Loss"); axes[0].legend(); axes[0].grid(alpha=0.3) # Learning rate schedule lr = [1e-4 * (1 - i/50) for i in range(50)] # Linear decay axes[1].plot(epochs, lr, color="#22c55e") axes[1].set_xlabel("Epoch"); axes[1].set_ylabel("Learning Rate") axes[1].set_title("LR Schedule"); axes[1].grid(alpha=0.3) plt.tight_layout() plt.savefig("training_curves.png", dpi=150, bbox_inches="tight") plt.show()

SECTION 03

Attention Heatmaps

import matplotlib.pyplot as plt import seaborn as sns import torch # Extract attention weights from a transformer # (run model with output_attentions=True) from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") model = AutoModel.from_pretrained("bert-base-uncased", output_attentions=True) text = "The cat sat on the mat" inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) # outputs.attentions: tuple of (layer, batch, head, seq, seq) attn = outputs.attentions[0][0, 0] # Layer 0, batch 0, head 0 tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]) # Plot heatmap fig, ax = plt.subplots(figsize=(8, 6)) sns.heatmap(attn.numpy(), xticklabels=tokens, yticklabels=tokens, cmap="Blues", ax=ax, vmin=0, vmax=1) ax.set_title("Attention Head 0, Layer 0") plt.tight_layout() plt.savefig("attention.png", dpi=150) plt.show()

SECTION 04

Embedding Visualization

import matplotlib.pyplot as plt import numpy as np # Reduce 768-dim embeddings to 2D for visualization # Option 1: UMAP (recommended for large datasets) import umap reducer = umap.UMAP(n_neighbors=15, min_dist=0.1, random_state=42) embeddings_2d = reducer.fit_transform(embeddings_768d) # Option 2: t-SNE (good for smaller datasets) from sklearn.manifold import TSNE embeddings_2d = TSNE(n_components=2, perplexity=30, random_state=42).fit_transform(embeddings_768d) # Plot with category colors fig, ax = plt.subplots(figsize=(10, 8)) categories = ["agents", "rag", "fine-tuning", "prompting", "eval"] colors = ["#4f46e5", "#22c55e", "#f59e0b", "#f43f5e", "#38bdf8"] for cat, color in zip(categories, colors): mask = labels == cat ax.scatter(embeddings_2d[mask, 0], embeddings_2d[mask, 1], c=color, label=cat, alpha=0.6, s=10) ax.legend(); ax.set_title("Embedding Space (UMAP)") plt.tight_layout(); plt.savefig("embeddings.png", dpi=150)

SECTION 05

Data Distribution Plots

import matplotlib.pyplot as plt import seaborn as sns import pandas as pd df = pd.read_parquet("dataset.parquet") df["length"] = df["text"].str.len() df["n_words"] = df["text"].str.split().str.len() fig, axes = plt.subplots(2, 2, figsize=(12, 8)) # Text length distribution axes[0, 0].hist(df["length"], bins=50, color="#4f46e5", edgecolor="white", alpha=0.8) axes[0, 0].set_title("Character Length Distribution"); axes[0, 0].set_xlabel("Length") # Label balance df["label"].value_counts().plot.bar(ax=axes[0, 1], color="#22c55e") axes[0, 1].set_title("Class Balance"); axes[0, 1].set_xlabel("Label") # Length by label (violin) sns.violinplot(data=df, x="label", y="n_words", ax=axes[1, 0]) axes[1, 0].set_title("Word Count by Label") # Correlation heatmap (numeric features) numeric = df.select_dtypes(include="number") sns.heatmap(numeric.corr(), annot=True, cmap="coolwarm", ax=axes[1, 1]) axes[1, 1].set_title("Feature Correlations") plt.tight_layout(); plt.savefig("eda.png", dpi=150)

SECTION 06

Saving & Integration

import matplotlib.pyplot as plt # Save high-quality figures for papers/reports plt.savefig("figure.pdf", bbox_inches="tight") # Vector — best for papers plt.savefig("figure.png", dpi=300, bbox_inches="tight") # Raster for web plt.savefig("figure.svg", bbox_inches="tight") # Editable in Inkscape/Figma # W&B integration — log figures during training import wandb wandb.init(project="my-project") fig, ax = plt.subplots() ax.plot(losses) wandb.log({"loss_curve": wandb.Image(fig)}) plt.close(fig) # Important: close to free memory # Jupyter inline display %matplotlib inline # In notebook: render inline plt.rcParams.update({"figure.dpi": 120, "font.size": 12}) # Nicer defaults

Quick wins: Set plt.style.use("seaborn-v0_8-whitegrid") for cleaner plots. Use figsize=(12, 4) for training curves. Always call plt.tight_layout() before saving.

SECTION 08

Matplotlib vs Interactive Tools

Tool	Use Case	Real-time	Static Export
Matplotlib	Publication-quality plots, full control	Limited	Excellent
Plotly	Interactive dashboards, web deployment	Yes	Good
Wandb/TensorBoard	Real-time training monitoring	Yes	Built-in
Seaborn	Statistical plots, aesthetics	No	Good

Choosing the right plot type: For continuous metrics like training loss, line plots with smoothing are standard. For per-batch statistics with high variance, scatter plots with a smoothed trend line provide both detail and overview. For classification metrics at different thresholds, precision-recall curves or ROC curves are essential. For attention patterns, heatmaps with proper color normalization reveal what the model attends to. The plot type should match the story you want to tell.

Color and style choices matter for accessibility and clarity. Colorblind-friendly palettes (like viridis, cividis) should be default. Line styles (solid, dashed, dotted) help distinguish curves when color alone is insufficient. Legends should be positioned to minimize occlusion of data. Axes should be labeled with units and ranges suitable for the data range. Many models fail silently because training curves weren't monitored—investing in good visualization practices catches problems early.

Saving plots for papers or presentations requires explicit dpi and format choices. PNG at 300 dpi is standard for web; PDF is better for printing to avoid rasterization artifacts. Using matplotlib's constrained layout (constrained_layout=True) prevents label cutoff in saved figures. Integrating matplotlib with wandb or tensorboard enables both static exports and interactive exploration during training.

EXTRA

Advanced Visualization for Model Analysis

Beyond loss curves, visualizing model internals provides insights into learning dynamics. Activation distributions (histograms of hidden unit values) show whether neurons are becoming dead (always outputting zero) or if gradients vanish (distributions converging to zero variance). Weight distributions reveal whether initialization is appropriate or if gradients have exploded. These visualizations are indispensable for debugging training instabilities.

Attention visualization in transformers has become a standard tool for interpretability research. Heatmaps showing which tokens each head attends to reveal linguistic structure learned by the model. Some attention patterns match linguistic intuitions (attending to related words), while others reveal that attention heads specialize in other tasks (position tracking, punctuation handling). Visualizing attention has democratized transformer interpretability.

Embedding visualization through t-SNE or UMAP projects high-dimensional learned representations to 2D for human inspection. Clusters in embedding space can indicate that the model has learned meaningful semantic or syntactic structure. Monitoring embedding visualization across training epochs can diagnose when the model transitions from memorization to generalization.

Integration with modern ML platforms like Weights and Biases (W&B) or MLflow enables automated plot generation and versioning. These platforms embed matplotlib plots in dashboards, compare runs visually, and export publication-ready figures. Learning matplotlib deeply pays dividends because W&B and MLflow are built on matplotlib's foundation. Understanding the underlying matplotlib enables sophisticated custom visualizations.

Creating reproducible plots requires controlling randomness: matplotlib's seeding, data shuffling, and color map choices should be deterministic. Saving figure metadata (DPI, figsize, font sizes) in code comments ensures plots can be regenerated if raw data changes. In team environments, matplotlib style files (matplotlib.rcParams or style sheets) enforce consistency across plots generated by different team members.

For paper submissions, plot quality matters for acceptance and impact. Publication-quality plots use appropriate color schemes, font sizes readable at print size (not screen size), proper axis labels with units, and legends positioned to avoid occlusion. Matplotlib's constrained_layout feature handles spacing automatically. Investing time in visualization quality during development saves days of scrambling before paper deadlines.

BOOST

Matplotlib for Scientific Publications

Publishing research requires publication-quality visualizations. Matplotlib enables precise control over every pixel—font families, sizes, colors, line styles, marker shapes, annotations, and layouts. This control is both powerful and tedious, but necessary for figures that will be printed, projected, and scrutinized by reviewers. Developing a personal matplotlib style template (saved settings and utility functions) pays dividends across papers and projects.

Color choice carries subtle implications: red/green combinations are problematic for colorblind readers; warm colors (red, orange) typically indicate errors or problems, while cool colors (blue, green) suggest success or improvement. Using colorblind-friendly palettes like viridis or cividis is standard practice now. Publication guides often specify color space requirements (RGB for screens, CMYK for print), and matplotlib supports these considerations.

Reproducibility of figures requires version control of both data and plotting code. Matplotlib scripts that generate figures should be committed to version control alongside paper manuscripts. When figures are regenerated with updated data, the plotting code remains stable. This enables future regeneration if paper revisions require updated plots, a scenario that happens frequently before publication.

Matplotlib / Seaborn

Table of Contents

Why Visualize?

Loss & Metrics Curves

Attention Heatmaps

Embedding Visualization

Data Distribution Plots

Saving & Integration

Common Plotting Mistakes

Matplotlib vs Interactive Tools

Advanced Visualization for Model Analysis

Matplotlib for Scientific Publications

Matplotlib / Seaborn

Table of Contents

Why Visualize?

Loss & Metrics Curves

Attention Heatmaps

Embedding Visualization

Data Distribution Plots

Saving & Integration

Common Plotting Mistakes

Matplotlib vs Interactive Tools

Advanced Visualization for Model Analysis

Matplotlib for Scientific Publications

Related concepts