Training · Tooling

Fine-Tuning Tools & Frameworks

Axolotl, Unsloth, LlamaFactory, and Hugging Face TRL — when to use each and how to configure them

6 frameworks
5 sections
Python first
Contents
  1. Tool landscape
  2. Framework comparison
  3. Axolotl deep dive
  4. Unsloth deep dive
  5. TRL & SFTTrainer
  6. Dataset formats
  7. Tools & frameworks
  8. References
01 — Ecosystem

Tool Landscape

Fine-tuning tools vary by abstraction level, ease of use, performance, and customization depth. No single tool is best for everyone — the choice depends on your model size, GPU budget, and expertise.

Trainer libraries (TRL, Unsloth): Lower-level. More control, less abstraction. You write Python training loops. Configuration-first tools (Axolotl, LlamaFactory): Higher-level. Declarative YAML configs. You specify what, the tool handles how.

Tradeoff: Axolotl is easier for standard workflows but harder to customize. TRL is more flexible but requires more Python knowledge.

💡 Start with Axolotl or Unsloth: Most teams should begin with one of these. They handle 80% of use cases. Drop to TRL only if you need custom training logic.
02 — Feature Matrix

Framework Comparison

FrameworkEaseSpeedLoRAMulti-GPUCustom Datasets
AxolotlEasy (YAML)Good✓ FSDP✓ Flexible
UnslothMedium (Python)Very FastLimited
LlamaFactoryEasy (YAML + UI)Good
TRL (SFTTrainer)Hard (Python)Good✓ Full control
LitGPTMediumGood

Quick decision: Use Axolotl for most teams. Use Unsloth if speed is critical. Use TRL if you need custom training loops. Use LlamaFactory if you want a UI.

03 — Config-Driven

Axolotl Deep Dive

Axolotl is a community trainer built on Hugging Face Transformers. Declarative YAML configs specify models, datasets, training parameters, and LoRA settings. Popular because it handles distributed training seamlessly.

Example YAML Configuration

base_model: meta-llama/Llama-2-7b-hf model_type: llama tokenizer_type: llama push_dataset_to_hub: false load_in_8bit: true load_in_4bit: false strict: false datasets: - path: json data_files: - data/train.json type: system_prompt: "" field_messages: messages dataset_prepared_path: data/prepared val_set_size: 0.1 output_dir: ./axolotl-output sequence_len: 2048 sample_packing: true eval_sample_packing: false pad_to_sequence_len: true lora_model_dir: null r: 8 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: - q_proj - v_proj num_epochs: 3 learning_rate: 0.0002 optimizer: adamw_8bit lr_scheduler: cosine warmup_steps: 100 train_batch_size: 4 eval_batch_size: 4 gradient_accumulation_steps: 4 max_grad_norm: 1.0 weight_decay: 0.0 fp16: true bf16: false gradient_checkpointing: true flash_attention: true

Training

Run training with a single command: accelerate launch -m axolotl.cli.train config.yaml. Axolotl automatically handles multi-GPU/multi-node via Accelerate.

💡 Axolotl strengths: Distributed training, built-in eval loop, flexible dataset formats, flash attention support. Best for production teams.
04 — Ultra-Fast

Unsloth Deep Dive

Unsloth patches Transformers models for 2–4× speedup with lower memory via custom kernels. Integrates with Hugging Face trainers. Best for resource-constrained environments (Colab, limited GPUs).

Memory Optimization Tricks

Gradient checkpointing: Trade compute for memory. Recompute activations during backward pass. Flash Attention v2: Reduce memory by 3× vs standard attention. BFloat16: Smaller models fit in less memory, minimal precision loss.

Example: Training Llama 2 7B on Single GPU

from unsloth import FastLanguageModel from transformers import TrainingArguments, SFTTrainer # Load and patch model model, tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/llama-2-7b-bnb-4bit", max_seq_length=2048, load_in_4bit=True, ) # Prepare model for LoRA model = FastLanguageModel.get_peft_model( model, r=8, lora_alpha=16, lora_dropout=0.05, target_modules=["q_proj", "v_proj"], bias="none", use_gradient_checkpointing="unsloth", random_state=42, ) # Standard Hugging Face trainer trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=dataset, dataset_text_field="text", max_seq_length=2048, args=TrainingArguments( per_device_train_batch_size=4, gradient_accumulation_steps=4, warmup_steps=100, num_train_epochs=3, learning_rate=2e-4, fp16=True, optim="adamw_8bit", output_dir="output", ), ) trainer.train()
⚠️ Unsloth single-GPU friendly: But doesn't scale well to multi-GPU. Use Axolotl for distributed training.
05 — Flexible

TRL & SFTTrainer

Hugging Face TRL (Transformers Reinforcement Learning) provides SFTTrainer for supervised fine-tuning and PPO/DPO trainers for preference-based alignment. More low-level — you write Python.

SFTTrainer Configuration

from transformers import AutoModelForCausalLM, AutoTokenizer from trl import SFTTrainer from transformers import TrainingArguments model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b") tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b") args = TrainingArguments( output_dir="./output", num_train_epochs=3, per_device_train_batch_size=8, gradient_accumulation_steps=4, warmup_steps=100, weight_decay=0.01, learning_rate=5e-5, fp16=True, optim="adamw_8bit", save_strategy="steps", save_steps=500, eval_strategy="steps", eval_steps=500, ) trainer = SFTTrainer( model=model, tokenizer=tokenizer, args=args, train_dataset=dataset, dataset_text_field="text", max_seq_length=2048, packing=True, ) trainer.train()

DPO training: For preference-based alignment, use TRL's DPOTrainer with (prompt, chosen, rejected) triplets. Simpler than RLHF.

💡 When to use TRL: Custom training logic, reward modeling, DPO/PPO alignment. Standard SFT? Use Axolotl instead.
06 — Data Prep

Dataset Formats

Different frameworks prefer different formats. Most convert to a standard at runtime.

FormatStructureBest ForExample
Alpacainstruction, input, outputInstruction-following{"instruction": "...", "output": "..."}
ShareGPTconversations array with role/contentMulti-turn dialogue{"conversations": [{"from": "human", "value": "..."}]}
Chat ML<|im_start|>system/user/assistant<|im_end|>Claude-style formattingSpecial tokens for role separation
Instruction-ResponsePaired (prompt, completion)Basic SFT{"prompt": "...", "completion": "..."}

Recommendation: Use ShareGPT for multi-turn, Alpaca for instruction-following, Chat ML for Claude-like formatting.

07 — Ecosystem

Tools & Frameworks

Trainer
Axolotl
Config-driven, production-ready. Multi-GPU, flexible datasets. Best for most teams.
Trainer
Unsloth
2–4× speedup via custom kernels. Single GPU focus. Colab-friendly.
Trainer
LlamaFactory
Web UI + CLI. Config-driven like Axolotl. Good for no-code teams.
Library
TRL (Hugging Face)
SFT, DPO, PPO trainers. Lower-level. Most flexible.
Trainer
LitGPT
Lightning-based. Simple CLI, distributed training. Apache 2.0 licence.
Platform
Hugging Face Hub
Model hosting, dataset management, training integrations.
Monitoring
Weights & Biases
Training visualization, hyperparameter tracking, model management.
Infrastructure
Modal
Serverless GPU training. Pay-per-second. Good for bursts.
08 — Further Reading

References

Documentation & Guides
Practitioner Writing