Fine-Tuning Tools & Frameworks

Contents

Tool landscape
Framework comparison
Axolotl deep dive
Unsloth deep dive
TRL & SFTTrainer
Dataset formats
Tools & frameworks
References

01 — Ecosystem

Tool Landscape

Fine-tuning tools vary by abstraction level, ease of use, performance, and customization depth. No single tool is best for everyone — the choice depends on your model size, GPU budget, and expertise.

Trainer libraries (TRL, Unsloth): Lower-level. More control, less abstraction. You write Python training loops. Configuration-first tools (Axolotl, LlamaFactory): Higher-level. Declarative YAML configs. You specify what, the tool handles how.

Tradeoff: Axolotl is easier for standard workflows but harder to customize. TRL is more flexible but requires more Python knowledge.

💡 Start with Axolotl or Unsloth: Most teams should begin with one of these. They handle 80% of use cases. Drop to TRL only if you need custom training logic.

02 — Feature Matrix

Framework Comparison

Framework	Ease	Speed	LoRA	Multi-GPU	Custom Datasets
Axolotl	Easy (YAML)	Good	✓	✓ FSDP	✓ Flexible
Unsloth	Medium (Python)	Very Fast	✓	Limited	✓
LlamaFactory	Easy (YAML + UI)	Good	✓	✓	✓
TRL (SFTTrainer)	Hard (Python)	Good	✓	✓	✓ Full control
LitGPT	Medium	Good	✓	✓	✓

Quick decision: Use Axolotl for most teams. Use Unsloth if speed is critical. Use TRL if you need custom training loops. Use LlamaFactory if you want a UI.

03 — Config-Driven

Axolotl Deep Dive

Axolotl is a community trainer built on Hugging Face Transformers. Declarative YAML configs specify models, datasets, training parameters, and LoRA settings. Popular because it handles distributed training seamlessly.

Example YAML Configuration

base_model: meta-llama/Llama-2-7b-hf model_type: llama tokenizer_type: llama push_dataset_to_hub: false load_in_8bit: true load_in_4bit: false strict: false datasets: - path: json data_files: - data/train.json type: system_prompt: "" field_messages: messages dataset_prepared_path: data/prepared val_set_size: 0.1 output_dir: ./axolotl-output sequence_len: 2048 sample_packing: true eval_sample_packing: false pad_to_sequence_len: true lora_model_dir: null r: 8 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: - q_proj - v_proj num_epochs: 3 learning_rate: 0.0002 optimizer: adamw_8bit lr_scheduler: cosine warmup_steps: 100 train_batch_size: 4 eval_batch_size: 4 gradient_accumulation_steps: 4 max_grad_norm: 1.0 weight_decay: 0.0 fp16: true bf16: false gradient_checkpointing: true flash_attention: true

Training

Run training with a single command: accelerate launch -m axolotl.cli.train config.yaml. Axolotl automatically handles multi-GPU/multi-node via Accelerate.

💡 Axolotl strengths: Distributed training, built-in eval loop, flexible dataset formats, flash attention support. Best for production teams.

04 — Ultra-Fast

Unsloth Deep Dive

Unsloth patches Transformers models for 2–4× speedup with lower memory via custom kernels. Integrates with Hugging Face trainers. Best for resource-constrained environments (Colab, limited GPUs).

Memory Optimization Tricks

Gradient checkpointing: Trade compute for memory. Recompute activations during backward pass. Flash Attention v2: Reduce memory by 3× vs standard attention. BFloat16: Smaller models fit in less memory, minimal precision loss.

Example: Training Llama 2 7B on Single GPU

from unsloth import FastLanguageModel from transformers import TrainingArguments, SFTTrainer # Load and patch model model, tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/llama-2-7b-bnb-4bit", max_seq_length=2048, load_in_4bit=True, ) # Prepare model for LoRA model = FastLanguageModel.get_peft_model( model, r=8, lora_alpha=16, lora_dropout=0.05, target_modules=["q_proj", "v_proj"], bias="none", use_gradient_checkpointing="unsloth", random_state=42, ) # Standard Hugging Face trainer trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=dataset, dataset_text_field="text", max_seq_length=2048, args=TrainingArguments( per_device_train_batch_size=4, gradient_accumulation_steps=4, warmup_steps=100, num_train_epochs=3, learning_rate=2e-4, fp16=True, optim="adamw_8bit", output_dir="output", ), ) trainer.train()

⚠️ Unsloth single-GPU friendly: But doesn't scale well to multi-GPU. Use Axolotl for distributed training.

05 — Flexible

TRL & SFTTrainer

Hugging Face TRL (Transformers Reinforcement Learning) provides SFTTrainer for supervised fine-tuning and PPO/DPO trainers for preference-based alignment. More low-level — you write Python.

SFTTrainer Configuration

from transformers import AutoModelForCausalLM, AutoTokenizer from trl import SFTTrainer from transformers import TrainingArguments model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b") tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b") args = TrainingArguments( output_dir="./output", num_train_epochs=3, per_device_train_batch_size=8, gradient_accumulation_steps=4, warmup_steps=100, weight_decay=0.01, learning_rate=5e-5, fp16=True, optim="adamw_8bit", save_strategy="steps", save_steps=500, eval_strategy="steps", eval_steps=500, ) trainer = SFTTrainer( model=model, tokenizer=tokenizer, args=args, train_dataset=dataset, dataset_text_field="text", max_seq_length=2048, packing=True, ) trainer.train()

DPO training: For preference-based alignment, use TRL's DPOTrainer with (prompt, chosen, rejected) triplets. Simpler than RLHF.

💡 When to use TRL: Custom training logic, reward modeling, DPO/PPO alignment. Standard SFT? Use Axolotl instead.

06 — Data Prep

Dataset Formats

Different frameworks prefer different formats. Most convert to a standard at runtime.

Format	Structure	Best For	Example
Alpaca	instruction, input, output	Instruction-following	`{"instruction": "...", "output": "..."}`
ShareGPT	conversations array with role/content	Multi-turn dialogue	`{"conversations": [{"from": "human", "value": "..."}]}`
Chat ML	<\|im_start\|>system/user/assistant<\|im_end\|>	Claude-style formatting	Special tokens for role separation
Instruction-Response	Paired (prompt, completion)	Basic SFT	`{"prompt": "...", "completion": "..."}`

Recommendation: Use ShareGPT for multi-turn, Alpaca for instruction-following, Chat ML for Claude-like formatting.

07 — Ecosystem

Tools & Frameworks

Trainer

Axolotl

Config-driven, production-ready. Multi-GPU, flexible datasets. Best for most teams.

Trainer

Unsloth

2–4× speedup via custom kernels. Single GPU focus. Colab-friendly.

Trainer

LlamaFactory

Web UI + CLI. Config-driven like Axolotl. Good for no-code teams.

Library

TRL (Hugging Face)

SFT, DPO, PPO trainers. Lower-level. Most flexible.

Trainer

LitGPT

Lightning-based. Simple CLI, distributed training. Apache 2.0 licence.

Platform

Hugging Face Hub

Model hosting, dataset management, training integrations.

Monitoring

Weights & Biases

Training visualization, hyperparameter tracking, model management.

Infrastructure

Modal

Serverless GPU training. Pay-per-second. Good for bursts.

08 — Further Reading

References

Documentation & Guides

Docs Axolotl GitHub. github.com/axolotlai/axolotl ↗
Docs Unsloth GitHub. github.com/unslothai/unsloth ↗
Docs Hugging Face TRL. huggingface.co/docs/trl ↗
Docs LlamaFactory. github.com/hiyouga/LLaMA-Factory ↗
Docs LitGPT. github.com/Lightning-AI/litgpt ↗

Practitioner Writing

Blog Hugging Face. Fine-Tuning with Axolotl: A Complete Guide. — huggingface.co/blog ↗
Blog Unsloth. 2x Faster LLM Training with Custom Kernels. — unsloth.ai ↗

Fine-Tuning Tools & Frameworks

Tool Landscape

Framework Comparison

Axolotl Deep Dive

Example YAML Configuration

Training

Unsloth Deep Dive

Memory Optimization Tricks

Example: Training Llama 2 7B on Single GPU

TRL & SFTTrainer

SFTTrainer Configuration

Dataset Formats

Tools & Frameworks

References

Related concepts