01 — Foundation
Why Python for ML and LLMs
Python has become the lingua franca for machine learning because it combines readability, extensive libraries, and interactivity. From data exploration to model training to deployment, Python ecosystems (PyData, FastAPI, pytest) provide production-grade tools.
The Python ML Stack
The modern Python ML workflow uses layered abstractions:
- Foundations: NumPy (arrays), pandas (DataFrames), SciPy (optimization)
- Visualization: matplotlib, seaborn, plotly
- Interactivity: Jupyter notebooks, IPython
- ML frameworks: PyTorch, TensorFlow, scikit-learn
- LLM integration: LangChain, LlamaIndex, Pydantic
- APIs & serving: FastAPI, Flask, Starlette
- Testing & quality: pytest, black, mypy
- Environment: venv, conda, uv, Poetry
💡
Pro tip: Learn NumPy deeply. Most ML libraries (PyTorch, pandas, scikit-learn) build on NumPy's array model. Mastering arrays = understanding everything else.
02 — Scientific Python
The Scientific Python Stack
PyData (NumPy, pandas, scipy, matplotlib) is the backbone of ML. These libraries provide efficient n-dimensional arrays, DataFrames, statistical functions, and visualization.
Core Libraries
- NumPy: N-dimensional arrays, linear algebra, random sampling
- pandas: DataFrames for tabular data, ETL operations
- SciPy: Scientific algorithms (optimization, integration, sparse matrices)
- scikit-learn: Classical ML (clustering, classification, regression)
- matplotlib: Low-level plotting, full customization
- seaborn: Statistical visualization, high-level interfaces
Quick Setup
# Install PyData stack
pip install numpy pandas scipy scikit-learn matplotlib seaborn jupyter
# Standard imports in ML notebooks
import numpy as np
import pandas as pd
from scipy import stats
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
# Set visualization defaults
sns.set_style("darkgrid")
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 10
03 — Data Structures
NumPy Arrays & pandas DataFrames
NumPy arrays are the foundational data structure for all numerical computing. pandas DataFrames extend arrays to handle tabular data with named columns and mixed types.
NumPy Essentials
# NumPy arrays are the foundation
import numpy as np
# Create arrays
a = np.array([1, 2, 3]) # 1D array
b = np.array([[1, 2], [3, 4]]) # 2D array (matrix)
c = np.zeros((3, 4)) # 3x4 matrix of zeros
d = np.ones((2, 3)) # 2x3 matrix of ones
e = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
# Operations
result = a + 5 # Broadcast scalar
result = a * b[0] # Element-wise multiply
result = np.dot(a, a) # Dot product (13)
result = np.linalg.norm(a) # L2 norm
# Reshaping
x = np.arange(12).reshape(3, 4) # 3x4 matrix
flat = x.flatten() # 1D array
pandas DataFrames for ML
# pandas DataFrames for tabular data
import pandas as pd
# Create from dict
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Carol'],
'age': [25, 30, 28],
'salary': [50000, 60000, 55000]
})
# Read CSV
df = pd.read_csv('data.csv')
# Explore
print(df.head()) # First rows
print(df.describe()) # Statistics
print(df.dtypes) # Column types
print(df.info()) # Memory usage
# Transform
df['bonus'] = df['salary'] * 0.1 # New column
df_filtered = df[df['age'] > 25] # Filter rows
grouped = df.groupby('age')['salary'].mean() # Aggregations
df_clean = df.dropna() # Remove NaN
# Export
df.to_csv('output.csv', index=False)
04 — Visualization
matplotlib, seaborn, and plotly
Visualization is critical for ML. matplotlib offers low-level control, seaborn provides statistical interfaces, and plotly creates interactive dashboards.
Visualization Workflow
# matplotlib for publication-quality plots
import matplotlib.pyplot as plt
import seaborn as sns
# Basic plot
plt.figure(figsize=(10, 6))
plt.plot(x, y, 'o-', label='data')
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.legend()
plt.title('My Chart')
plt.savefig('plot.png', dpi=300)
plt.show()
# seaborn for statistical plots
sns.set_style("whitegrid")
sns.scatterplot(data=df, x='age', y='salary', hue='dept')
plt.show()
# Distribution plot
sns.histplot(data=df, x='salary', kde=True)
plt.show()
# Heatmap (correlation matrix)
plt.figure(figsize=(8, 6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.show()
05 — Interactive Development
Jupyter Notebooks & IPython
Jupyter enables interactive code exploration and documentation. IPython provides a powerful interactive shell with magic commands, rich output, and debugging tools.
Jupyter Workflow
- Start server:
jupyter notebook or jupyter lab
- Cells: Code cells execute independently with shared state
- Output: Tables, plots, and HTML render inline
- Markdown: Document code with rich text, LaTeX, and images
- Kernels: Run Python, R, Julia, and other languages
IPython Magic Commands
# IPython magic commands (% = line, %% = cell)
%timeit sum(range(100)) # Timing
%time result = expensive_function() # Single run
%matplotlib inline # Display plots
%load_ext autoreload
%autoreload 2 # Auto-reload modules
%pwd # Print working directory
%cd /path # Change directory
%ls # List files
%debug # Enter debugger
# Run external script
%run my_script.py
# Get help
?np.array
??np.array # Shows source code
06 — Environment Management
Virtual Environments & Dependency Management
Isolating project dependencies prevents conflicts. Choose between venv (built-in), conda (full science stack), Poetry (modern Python), and uv (fast).
Environment Managers Comparison
| Tool |
Setup |
Speed |
Best For |
| venv |
Built into Python |
Fast |
Simple projects, quick sandboxes |
| conda |
Separate installer |
Slow |
Data science, pre-compiled packages |
| Poetry |
pip install poetry |
Medium |
Production apps, lock files |
| uv |
pip install uv |
Very fast |
Modern projects, speed-critical |
venv Workflow
# Create virtual environment
python -m venv venv
# Activate
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows
# Install packages
pip install numpy pandas pytorch::pytorch
# Export dependencies
pip freeze > requirements.txt
# Reproduce environment
pip install -r requirements.txt
# Deactivate
deactivate
uv Quick Start (Modern Choice)
# uv is a fast, unified Python tool
pip install uv
# Create project with uv
uv init my_project
cd my_project
# Sync dependencies
uv sync
# Add a package
uv add numpy pandas
# Run Python in venv
uv run python script.py
# Lock file (reproducible)
uv lock
07 — Modern ML Toolchain
The Complete ML Development Stack
Beyond data processing, modern ML projects require frameworks (PyTorch), testing (pytest), code quality (black, mypy), and experiment tracking (Weights & Biases). FastAPI enables production APIs.
ML Frameworks
- PyTorch: Dynamic computation graphs, research-friendly, GPU support
- TensorFlow: Production-grade, serving infrastructure, lower-level API
- scikit-learn: Classical ML, preprocessing, evaluation metrics
- Hugging Face transformers: Pre-trained models, fine-tuning, inference
- LangChain/LlamaIndex: LLM applications, RAG, agent orchestration
PyTorch Example
# PyTorch for neural networks
import torch
import torch.nn as nn
import torch.optim as optim
# Define model
class SimpleNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x
# Training loop
model = SimpleNet(10, 32, 2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
for epoch in range(100):
optimizer.zero_grad()
output = model(x_train)
loss = criterion(output, y_train)
loss.backward()
optimizer.step()
Testing & Code Quality
# pytest for testing
import pytest
def test_sum():
assert sum([1, 2, 3]) == 6
# Run tests
pytest test_script.py -v
# Code formatting (black)
black my_script.py # Auto-formats code
# Type checking (mypy)
mypy my_script.py # Static type analysis
# Linting (ruff)
ruff check . # Fast Python linter
# Profiling (cProfile)
import cProfile
cProfile.run('expensive_function()') # See bottlenecks
Production APIs with FastAPI
# FastAPI for serving ML models
from fastapi import FastAPI
from pydantic import BaseModel
import torch
app = FastAPI()
# Load model once on startup
model = None
@app.on_event("startup")
async def load_model():
global model
model = torch.load("model.pth")
model.eval()
class PredictRequest(BaseModel):
input: list[float]
@app.post("/predict")
async def predict(request: PredictRequest):
with torch.no_grad():
output = model(torch.tensor(request.input))
return {"prediction": output.tolist()}
# Run: uvicorn script:app --reload
Experiment Tracking
Track hyperparameters, metrics, and models with Weights & Biases or MLflow.
# Weights & Biases for experiment tracking
import wandb
wandb.init(project="my-ml-project")
# Log hyperparameters
wandb.config.learning_rate = 0.001
wandb.config.batch_size = 32
# Log during training
for epoch in range(100):
loss = train_step()
wandb.log({"loss": loss, "epoch": epoch})
# Log model
wandb.save("model.pth")
wandb.finish()
References & Further Learning
Official Documentation
Frameworks & Tools
Learning Resources
Best Practices