Foundations · Python

Python Ecosystem for ML

NumPy arrays, pandas DataFrames, Jupyter notebooks, and the modern ML toolchain for building with LLMs

12libraries
7sections
Python-firstexamples
Contents
  1. Why Python for ML
  2. Scientific Python
  3. NumPy & pandas
  4. Visualization
  5. Jupyter notebooks
  6. Environment mgmt
  7. Modern toolchain
01 — Foundation

Why Python for ML and LLMs

Python has become the lingua franca for machine learning because it combines readability, extensive libraries, and interactivity. From data exploration to model training to deployment, Python ecosystems (PyData, FastAPI, pytest) provide production-grade tools.

The Python ML Stack

The modern Python ML workflow uses layered abstractions:

💡 Pro tip: Learn NumPy deeply. Most ML libraries (PyTorch, pandas, scikit-learn) build on NumPy's array model. Mastering arrays = understanding everything else.
02 — Scientific Python

The Scientific Python Stack

PyData (NumPy, pandas, scipy, matplotlib) is the backbone of ML. These libraries provide efficient n-dimensional arrays, DataFrames, statistical functions, and visualization.

Core Libraries

Quick Setup

# Install PyData stack pip install numpy pandas scipy scikit-learn matplotlib seaborn jupyter # Standard imports in ML notebooks import numpy as np import pandas as pd from scipy import stats from sklearn.preprocessing import StandardScaler import matplotlib.pyplot as plt import seaborn as sns # Set visualization defaults sns.set_style("darkgrid") plt.rcParams['figure.figsize'] = (10, 6) plt.rcParams['font.size'] = 10
03 — Data Structures

NumPy Arrays & pandas DataFrames

NumPy arrays are the foundational data structure for all numerical computing. pandas DataFrames extend arrays to handle tabular data with named columns and mixed types.

NumPy Essentials

# NumPy arrays are the foundation import numpy as np # Create arrays a = np.array([1, 2, 3]) # 1D array b = np.array([[1, 2], [3, 4]]) # 2D array (matrix) c = np.zeros((3, 4)) # 3x4 matrix of zeros d = np.ones((2, 3)) # 2x3 matrix of ones e = np.arange(0, 10, 2) # [0, 2, 4, 6, 8] # Operations result = a + 5 # Broadcast scalar result = a * b[0] # Element-wise multiply result = np.dot(a, a) # Dot product (13) result = np.linalg.norm(a) # L2 norm # Reshaping x = np.arange(12).reshape(3, 4) # 3x4 matrix flat = x.flatten() # 1D array

pandas DataFrames for ML

# pandas DataFrames for tabular data import pandas as pd # Create from dict df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Carol'], 'age': [25, 30, 28], 'salary': [50000, 60000, 55000] }) # Read CSV df = pd.read_csv('data.csv') # Explore print(df.head()) # First rows print(df.describe()) # Statistics print(df.dtypes) # Column types print(df.info()) # Memory usage # Transform df['bonus'] = df['salary'] * 0.1 # New column df_filtered = df[df['age'] > 25] # Filter rows grouped = df.groupby('age')['salary'].mean() # Aggregations df_clean = df.dropna() # Remove NaN # Export df.to_csv('output.csv', index=False)
04 — Visualization

matplotlib, seaborn, and plotly

Visualization is critical for ML. matplotlib offers low-level control, seaborn provides statistical interfaces, and plotly creates interactive dashboards.

Visualization Workflow

# matplotlib for publication-quality plots import matplotlib.pyplot as plt import seaborn as sns # Basic plot plt.figure(figsize=(10, 6)) plt.plot(x, y, 'o-', label='data') plt.xlabel('X axis') plt.ylabel('Y axis') plt.legend() plt.title('My Chart') plt.savefig('plot.png', dpi=300) plt.show() # seaborn for statistical plots sns.set_style("whitegrid") sns.scatterplot(data=df, x='age', y='salary', hue='dept') plt.show() # Distribution plot sns.histplot(data=df, x='salary', kde=True) plt.show() # Heatmap (correlation matrix) plt.figure(figsize=(8, 6)) sns.heatmap(df.corr(), annot=True, cmap='coolwarm') plt.show()
05 — Interactive Development

Jupyter Notebooks & IPython

Jupyter enables interactive code exploration and documentation. IPython provides a powerful interactive shell with magic commands, rich output, and debugging tools.

Jupyter Workflow

IPython Magic Commands

# IPython magic commands (% = line, %% = cell) %timeit sum(range(100)) # Timing %time result = expensive_function() # Single run %matplotlib inline # Display plots %load_ext autoreload %autoreload 2 # Auto-reload modules %pwd # Print working directory %cd /path # Change directory %ls # List files %debug # Enter debugger # Run external script %run my_script.py # Get help ?np.array ??np.array # Shows source code
06 — Environment Management

Virtual Environments & Dependency Management

Isolating project dependencies prevents conflicts. Choose between venv (built-in), conda (full science stack), Poetry (modern Python), and uv (fast).

Environment Managers Comparison

Tool Setup Speed Best For
venv Built into Python Fast Simple projects, quick sandboxes
conda Separate installer Slow Data science, pre-compiled packages
Poetry pip install poetry Medium Production apps, lock files
uv pip install uv Very fast Modern projects, speed-critical

venv Workflow

# Create virtual environment python -m venv venv # Activate source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows # Install packages pip install numpy pandas pytorch::pytorch # Export dependencies pip freeze > requirements.txt # Reproduce environment pip install -r requirements.txt # Deactivate deactivate

uv Quick Start (Modern Choice)

# uv is a fast, unified Python tool pip install uv # Create project with uv uv init my_project cd my_project # Sync dependencies uv sync # Add a package uv add numpy pandas # Run Python in venv uv run python script.py # Lock file (reproducible) uv lock
07 — Modern ML Toolchain

The Complete ML Development Stack

Beyond data processing, modern ML projects require frameworks (PyTorch), testing (pytest), code quality (black, mypy), and experiment tracking (Weights & Biases). FastAPI enables production APIs.

ML Frameworks

PyTorch Example

# PyTorch for neural networks import torch import torch.nn as nn import torch.optim as optim # Define model class SimpleNet(nn.Module): def __init__(self, input_size, hidden_size, output_size): super().__init__() self.fc1 = nn.Linear(input_size, hidden_size) self.relu = nn.ReLU() self.fc2 = nn.Linear(hidden_size, output_size) def forward(self, x): x = self.fc1(x) x = self.relu(x) x = self.fc2(x) return x # Training loop model = SimpleNet(10, 32, 2) criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) for epoch in range(100): optimizer.zero_grad() output = model(x_train) loss = criterion(output, y_train) loss.backward() optimizer.step()

Testing & Code Quality

# pytest for testing import pytest def test_sum(): assert sum([1, 2, 3]) == 6 # Run tests pytest test_script.py -v # Code formatting (black) black my_script.py # Auto-formats code # Type checking (mypy) mypy my_script.py # Static type analysis # Linting (ruff) ruff check . # Fast Python linter # Profiling (cProfile) import cProfile cProfile.run('expensive_function()') # See bottlenecks

Production APIs with FastAPI

# FastAPI for serving ML models from fastapi import FastAPI from pydantic import BaseModel import torch app = FastAPI() # Load model once on startup model = None @app.on_event("startup") async def load_model(): global model model = torch.load("model.pth") model.eval() class PredictRequest(BaseModel): input: list[float] @app.post("/predict") async def predict(request: PredictRequest): with torch.no_grad(): output = model(torch.tensor(request.input)) return {"prediction": output.tolist()} # Run: uvicorn script:app --reload

Experiment Tracking

Track hyperparameters, metrics, and models with Weights & Biases or MLflow.

# Weights & Biases for experiment tracking import wandb wandb.init(project="my-ml-project") # Log hyperparameters wandb.config.learning_rate = 0.001 wandb.config.batch_size = 32 # Log during training for epoch in range(100): loss = train_step() wandb.log({"loss": loss, "epoch": epoch}) # Log model wandb.save("model.pth") wandb.finish()

Python ML Ecosystem Tools

Arrays
NumPy
N-dimensional arrays, linear algebra, numerical computing foundation
DataFrames
pandas
Tabular data manipulation, ETL, exploratory data analysis
Plotting
matplotlib
Publication-quality plots, low-level control, customization
Visualization
seaborn
Statistical plots, attractive defaults, rapid visualization
Notebooks
Jupyter
Interactive coding, documentation, exploratory analysis
Deep Learning
PyTorch
Dynamic graphs, GPU support, research-friendly framework
Classical ML
scikit-learn
Classification, clustering, preprocessing, evaluation metrics
LLMs
LangChain
LLM chains, RAG, agent orchestration, integrations
Environment
uv
Fast dependency management and virtual environments
APIs
FastAPI
Modern web framework for ML model serving
Testing
pytest
Testing framework, fixtures, parametrization
Tracking
Weights & Biases
Experiment tracking, model registry, hyperparameter logging

References & Further Learning

Official Documentation
Frameworks & Tools
Learning Resources
Best Practices