Python Ecosystem for ML

Contents

Why Python for ML
Scientific Python
NumPy & pandas
Visualization
Jupyter notebooks
Environment mgmt
Modern toolchain

01 — Foundation

Why Python for ML and LLMs

Python has become the lingua franca for machine learning because it combines readability, extensive libraries, and interactivity. From data exploration to model training to deployment, Python ecosystems (PyData, FastAPI, pytest) provide production-grade tools.

The Python ML Stack

The modern Python ML workflow uses layered abstractions:

Foundations: NumPy (arrays), pandas (DataFrames), SciPy (optimization)
Visualization: matplotlib, seaborn, plotly
Interactivity: Jupyter notebooks, IPython
ML frameworks: PyTorch, TensorFlow, scikit-learn
LLM integration: LangChain, LlamaIndex, Pydantic
APIs & serving: FastAPI, Flask, Starlette
Testing & quality: pytest, black, mypy
Environment: venv, conda, uv, Poetry

💡 Pro tip: Learn NumPy deeply. Most ML libraries (PyTorch, pandas, scikit-learn) build on NumPy's array model. Mastering arrays = understanding everything else.

02 — Scientific Python

The Scientific Python Stack

PyData (NumPy, pandas, scipy, matplotlib) is the backbone of ML. These libraries provide efficient n-dimensional arrays, DataFrames, statistical functions, and visualization.

Core Libraries

NumPy: N-dimensional arrays, linear algebra, random sampling
pandas: DataFrames for tabular data, ETL operations
SciPy: Scientific algorithms (optimization, integration, sparse matrices)
scikit-learn: Classical ML (clustering, classification, regression)
matplotlib: Low-level plotting, full customization
seaborn: Statistical visualization, high-level interfaces

Quick Setup

# Install PyData stack pip install numpy pandas scipy scikit-learn matplotlib seaborn jupyter # Standard imports in ML notebooks import numpy as np import pandas as pd from scipy import stats from sklearn.preprocessing import StandardScaler import matplotlib.pyplot as plt import seaborn as sns # Set visualization defaults sns.set_style("darkgrid") plt.rcParams['figure.figsize'] = (10, 6) plt.rcParams['font.size'] = 10

03 — Data Structures

NumPy Arrays & pandas DataFrames

NumPy arrays are the foundational data structure for all numerical computing. pandas DataFrames extend arrays to handle tabular data with named columns and mixed types.

NumPy Essentials

# NumPy arrays are the foundation import numpy as np # Create arrays a = np.array([1, 2, 3]) # 1D array b = np.array([[1, 2], [3, 4]]) # 2D array (matrix) c = np.zeros((3, 4)) # 3x4 matrix of zeros d = np.ones((2, 3)) # 2x3 matrix of ones e = np.arange(0, 10, 2) # [0, 2, 4, 6, 8] # Operations result = a + 5 # Broadcast scalar result = a * b[0] # Element-wise multiply result = np.dot(a, a) # Dot product (13) result = np.linalg.norm(a) # L2 norm # Reshaping x = np.arange(12).reshape(3, 4) # 3x4 matrix flat = x.flatten() # 1D array

pandas DataFrames for ML

# pandas DataFrames for tabular data import pandas as pd # Create from dict df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Carol'], 'age': [25, 30, 28], 'salary': [50000, 60000, 55000] }) # Read CSV df = pd.read_csv('data.csv') # Explore print(df.head()) # First rows print(df.describe()) # Statistics print(df.dtypes) # Column types print(df.info()) # Memory usage # Transform df['bonus'] = df['salary'] * 0.1 # New column df_filtered = df[df['age'] > 25] # Filter rows grouped = df.groupby('age')['salary'].mean() # Aggregations df_clean = df.dropna() # Remove NaN # Export df.to_csv('output.csv', index=False)

04 — Visualization

matplotlib, seaborn, and plotly

Visualization is critical for ML. matplotlib offers low-level control, seaborn provides statistical interfaces, and plotly creates interactive dashboards.

Visualization Workflow

# matplotlib for publication-quality plots import matplotlib.pyplot as plt import seaborn as sns # Basic plot plt.figure(figsize=(10, 6)) plt.plot(x, y, 'o-', label='data') plt.xlabel('X axis') plt.ylabel('Y axis') plt.legend() plt.title('My Chart') plt.savefig('plot.png', dpi=300) plt.show() # seaborn for statistical plots sns.set_style("whitegrid") sns.scatterplot(data=df, x='age', y='salary', hue='dept') plt.show() # Distribution plot sns.histplot(data=df, x='salary', kde=True) plt.show() # Heatmap (correlation matrix) plt.figure(figsize=(8, 6)) sns.heatmap(df.corr(), annot=True, cmap='coolwarm') plt.show()

05 — Interactive Development

Jupyter Notebooks & IPython

Jupyter enables interactive code exploration and documentation. IPython provides a powerful interactive shell with magic commands, rich output, and debugging tools.

Jupyter Workflow

Start server: jupyter notebook or jupyter lab
Cells: Code cells execute independently with shared state
Output: Tables, plots, and HTML render inline
Markdown: Document code with rich text, LaTeX, and images
Kernels: Run Python, R, Julia, and other languages

IPython Magic Commands

# IPython magic commands (% = line, %% = cell) %timeit sum(range(100)) # Timing %time result = expensive_function() # Single run %matplotlib inline # Display plots %load_ext autoreload %autoreload 2 # Auto-reload modules %pwd # Print working directory %cd /path # Change directory %ls # List files %debug # Enter debugger # Run external script %run my_script.py # Get help ?np.array ??np.array # Shows source code

06 — Environment Management

Virtual Environments & Dependency Management

Isolating project dependencies prevents conflicts. Choose between venv (built-in), conda (full science stack), Poetry (modern Python), and uv (fast).

Environment Managers Comparison

Tool	Setup	Speed	Best For
venv	Built into Python	Fast	Simple projects, quick sandboxes
conda	Separate installer	Slow	Data science, pre-compiled packages
Poetry	pip install poetry	Medium	Production apps, lock files
uv	pip install uv	Very fast	Modern projects, speed-critical

venv Workflow

# Create virtual environment python -m venv venv # Activate source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows # Install packages pip install numpy pandas pytorch::pytorch # Export dependencies pip freeze > requirements.txt # Reproduce environment pip install -r requirements.txt # Deactivate deactivate

uv Quick Start (Modern Choice)

# uv is a fast, unified Python tool pip install uv # Create project with uv uv init my_project cd my_project # Sync dependencies uv sync # Add a package uv add numpy pandas # Run Python in venv uv run python script.py # Lock file (reproducible) uv lock

07 — Modern ML Toolchain

The Complete ML Development Stack

Beyond data processing, modern ML projects require frameworks (PyTorch), testing (pytest), code quality (black, mypy), and experiment tracking (Weights & Biases). FastAPI enables production APIs.

ML Frameworks

PyTorch: Dynamic computation graphs, research-friendly, GPU support
TensorFlow: Production-grade, serving infrastructure, lower-level API
scikit-learn: Classical ML, preprocessing, evaluation metrics
Hugging Face transformers: Pre-trained models, fine-tuning, inference
LangChain/LlamaIndex: LLM applications, RAG, agent orchestration

PyTorch Example

# PyTorch for neural networks import torch import torch.nn as nn import torch.optim as optim # Define model class SimpleNet(nn.Module): def __init__(self, input_size, hidden_size, output_size): super().__init__() self.fc1 = nn.Linear(input_size, hidden_size) self.relu = nn.ReLU() self.fc2 = nn.Linear(hidden_size, output_size) def forward(self, x): x = self.fc1(x) x = self.relu(x) x = self.fc2(x) return x # Training loop model = SimpleNet(10, 32, 2) criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) for epoch in range(100): optimizer.zero_grad() output = model(x_train) loss = criterion(output, y_train) loss.backward() optimizer.step()

Testing & Code Quality

# pytest for testing import pytest def test_sum(): assert sum([1, 2, 3]) == 6 # Run tests pytest test_script.py -v # Code formatting (black) black my_script.py # Auto-formats code # Type checking (mypy) mypy my_script.py # Static type analysis # Linting (ruff) ruff check . # Fast Python linter # Profiling (cProfile) import cProfile cProfile.run('expensive_function()') # See bottlenecks

Production APIs with FastAPI

# FastAPI for serving ML models from fastapi import FastAPI from pydantic import BaseModel import torch app = FastAPI() # Load model once on startup model = None @app.on_event("startup") async def load_model(): global model model = torch.load("model.pth") model.eval() class PredictRequest(BaseModel): input: list[float] @app.post("/predict") async def predict(request: PredictRequest): with torch.no_grad(): output = model(torch.tensor(request.input)) return {"prediction": output.tolist()} # Run: uvicorn script:app --reload

Experiment Tracking

Track hyperparameters, metrics, and models with Weights & Biases or MLflow.

# Weights & Biases for experiment tracking import wandb wandb.init(project="my-ml-project") # Log hyperparameters wandb.config.learning_rate = 0.001 wandb.config.batch_size = 32 # Log during training for epoch in range(100): loss = train_step() wandb.log({"loss": loss, "epoch": epoch}) # Log model wandb.save("model.pth") wandb.finish()

Python ML Ecosystem Tools

Arrays

NumPy

N-dimensional arrays, linear algebra, numerical computing foundation

DataFrames

pandas

Tabular data manipulation, ETL, exploratory data analysis

Plotting

matplotlib

Publication-quality plots, low-level control, customization

Visualization

seaborn

Statistical plots, attractive defaults, rapid visualization

Notebooks

Jupyter

Interactive coding, documentation, exploratory analysis

Deep Learning

PyTorch

Dynamic graphs, GPU support, research-friendly framework

Classical ML

scikit-learn

Classification, clustering, preprocessing, evaluation metrics

LLMs

LangChain

LLM chains, RAG, agent orchestration, integrations

Environment

Fast dependency management and virtual environments

APIs

FastAPI

Modern web framework for ML model serving

Testing

pytest

Testing framework, fixtures, parametrization

Tracking

Weights & Biases

Experiment tracking, model registry, hyperparameter logging

References & Further Learning

Official Documentation

Frameworks & Tools

Learning Resources

Best Practices

PEP 8 — Style Guide for Python Code GUIDE
The Hitchhiker's Guide to Python GUIDE
Real Python Tutorials BLOG
Deep Learning Book (Goodfellow et al.) GUIDE

Python Ecosystem for ML

Why Python for ML and LLMs

The Python ML Stack

The Scientific Python Stack

Core Libraries

Quick Setup

NumPy Arrays & pandas DataFrames

NumPy Essentials

pandas DataFrames for ML

matplotlib, seaborn, and plotly

Visualization Workflow

Jupyter Notebooks & IPython

Jupyter Workflow

IPython Magic Commands

Virtual Environments & Dependency Management

Environment Managers Comparison

venv Workflow

uv Quick Start (Modern Choice)

The Complete ML Development Stack

ML Frameworks

PyTorch Example

Testing & Code Quality

Production APIs with FastAPI

Experiment Tracking

Python ML Ecosystem Tools

References & Further Learning

Related concepts